1010data does Big Data in a big way: founded by two Wall Street veterans, the company offers an infinitely scalable solution that shies away from using buzz words and focuses instead on solving a problem that’s older than we think. CEO Sandy Steier joins us for a discussion on what Big Data really means to businesses, and why Hadoop and virtualization are the wrong technologies to focus on.
Joel Kaplan and I co-founded the company, both starting on Wall Street in the late 70s. Everyone there was using spreadsheets, but we were more technically inclined. My educational background at that point was in Math and Computer Science, while Joel’s was in Operations Research and Computer Science. I ended up in Mortgage-backed security at Morgan Stanley and Joel in Equity trading, which was really one of the first high-frequency trading operations in the world. We were on the business side, but relied heavily on technology to support our business. What we realized is that technology didn’t have to be an obstacle but that it could actually be a simple tool to use – it should facilitate rather than get in the way. When we needed to make a career change because of a significant realignment within the company, we were faced with the question, “What do we do with our lives?” We didn’t have the stomach to go back to another Wall Street company with all the politics and all the things people don’t like about big aggressive companies.
We realized that during our time on Wall Street, we had used technologies that dealt with large amounts of data. We thought, “Why don’t we bring that technology to the world?” The internet was a relatively new thing at the time, enabling us to not have to actually be present at a company to deal with their data. So we built a system where they sent us our data, we put it into the system, and gave them access to it.
Big Data usually refers to the size or nature of data. Data is typically classified as either structured or unstructured. I think most data that is analyzed within a business is structured. Structured data includes a lot more than most people think. Virtually all data in this world that companies want to analyze is structured.
The one aspect of Big Data that people talk about that 1010data doesn’t concern itself with is the velocity of data. Our system is an analytical database for historical types of analysis on large amounts of data. We don’t do transaction processing. We don’t update the data in real-time.
Having done data analysis, I don’t understand that at all. If you’re analyzing large amounts of data, you’re doing historical analysis. You don’t care what has happened within the last five minutes. That is distracting, because you really need to analyze the whole picture, starting with yesterday’s data, going back ten years. You don’t care what’s happening right now. I think those things have a natural separation.
When performing data analysis, you create a model to test certain theories. You apply that model in real-time as data flows into a system quickly. But you don’t do your entire historical analysis against real-time data at the same time. Besides, there are serious technical problems involved in this. You never want a system that is working with real-time data to be used for historical analysis, because there are too many wild cards. A system that captures real-time data needs to be as bulletproof and isolated as possible.
If you were to ask someone, ‘Would you build a data warehouse if you had a thousand rows of data?’ the answer would invariably be no. Why go through all that pain and suffering if you could just work with an Excel spreadsheet? But imagine if you’d ask, ‘Suppose you had a spreadsheet that could handle a trillion rows of data as easily as it holds one thousand, would you build a data warehouse then?’ The answer, in that case, should still be no, because the amount of data you have shouldn’t matter with a solution like ours.
We offer something that is very much like a spreadsheet. It’s an immersive experience where you can see all data, and you can scroll and interact with it. Just like with Excel, there is immediate feedback. It is very unguided and unstructured so that you can do whatever you want with it. That is what we set out to create back in 2000, and it’s what we’re still offering today.
Arguably, there has always been big data. Sure, data is bigger today than it ever was. There didn’t use to be social networking sites, social media sites, or machine data from smart meters. I won’t say, however, that it’s really something new under the sun. The thing is, people used to think that analyzing large amounts of data was hard, and they went about it in their own ad-hoc ways. Until two years ago, the world believed that a relational database was the right way to do data. That’s how academia taught you to handle data. The fact is, nobody really analyzes data that way.
What has happened recently is that all those exceptions to the rule actually became important. Suddenly, we need to focus on those exceptions. That, to me, is what Hadoop represents. It is a recognition that all the things that already existed triggered something we now have to face up to. We need a solution that’s not a relational database. Hadoop is the poster child for that now. Instead of everyone doing data processing themselves, Hadoop gives them a common ground by providing a code base. It is good for analysis because you can actually capture data and file it away somewhere for later. But in a way, it’s also not good for analysis, because it’s slow and has no analytical capability built into it.
If more people learn that Big Data processing is possible, they will want to use it more as well. The more it gets out there, the more things will change.
Let me give you an example. The Dollar General is the largest retailer in the United States by store count. They had an Oracle data warehouse that only did summary data like most warehouses – it was very limited. We brought in 1010data and shortly thereafter they realized they had no reason to use two databases. We ended up being their enterprise data warehouse. We’re cloud-based, so all of Dollar General’s data is now accessible to any user anywhere in the world. Dollar General has a symbiotic relationship with other people in the industry. They work with their suppliers (Pepsi, Coke, consumer goods manufacturers) and they help retailers sell products, arrange their shelves, and develop promotions. So when Dollar General realized that all their data concerning what each person buys is all in the cloud and is easily accessible by anybody, they thought, “Why don’t we open up our data to all companies we work with?” Imagine if the companies they work with could see every item or bottle of Coke that was sold by Dollar General. What insights would that provide into customer behavior? That’s gold for companies like Pepsi and Coke. That data would be so valuable to them they would gladly pay for it.
The idea that companies can share large amounts of data with each other is a really big idea. That is something we want to facilitate. It’s something we hadn’t anticipated. But in five years, there will be a lot of talk about this.
We have sales people that go to trade shows and follow up on leads. There is an educational component to this because you can’t just walk into a company that doesn’t
know their market strategy. We are trying to sell something that doesn’t exist in many people’s minds. It takes them a while to realize that this is something different than what they’re use to thinking about. But when we’re successful, our customers are actually happier. This process is so new and different, they don’t understand it until they start using it.
I think that technology in this world is way too complicated. It just bothers me, because I did not grow up like that. Software was always easy to use. The notion that business people should even hear the term Hadoop is strange. Why should a business person be hearing something like that?
Anything that doesn’t make the user experience easier is taking the wrong approach. I am not a huge fan of Hadoop. The level of attention it’s getting is more than it deserves. Everyone is talking about it, so if there was some person talking about something else, that is who I would find interesting. But all the chatter is really about the cloud and Hadoop.
The cloud, to a lot of people, is about virtualization. The notion that you could be running on these machines you don’t know about. The idea that you can re-use machines. But those are very low-level ideas. Virtualization is not new because it is not so interesting. The cloud meaning virtualization and big data meaning Hadoop, all the air is sucked out of the room. There aren’t a lot of people to talk about, which is unfortunate.