Big Data and Hadoop are at the forefront of many of the discussions in enterprise technology right now, but there’s still a lot of questions on how best to harness it. That’s where MapR Technologies comes in. MapR works towards bringing Hadoop to all types of businesses (not just enterprises), helping them realize actual results with Big Data management and analytics. We spoke with MapR’s CEO and co-founder, John Schroeder, to discuss how MapR got started, the future of Big Data analytics, and how MapR Technologies fits into that landscape by not only meeting the real needs of Hadoop users, but also bringing Hadoop to companies that are completely new to it.
Big data analytics was in the top three priorities in almost every one of those organizational interviews; and Hadoop was probably mentioned 20 times more than any other technology. There was a consistent theme of complaints, saying that Hadoop was too hard to use, it wasn’t reliable, it was hard to run in operations, and it didn’t perform very well.
That informed the need to push this technology forward–that there was a tremendous benefit to be gained from using a Hadoop style of technology to accomplish big data analytics. These discussions provided a consistent direction on where we had to take that platform, and how organizations could be successful with it.
Then, I met MC Srivas, who was working at Google on GFS and BigTable, which are the forebears of Hadoop. He and I partnered and he developed an architecture to address these critical issues in Hadoop. We put the company together in mid-2009 and felt very confident that we could bring a technology to market that would be beneficial to a broad range of organizations.
I think when you’re into markets early, you have to do a good job of primary research and get to enough organizations and the right people at those organizations in order to really understand what their needs are. I think by education and trade, we’re all technologists, so it’s very important to address it from the organizational needs perspective.
One of the difficult parts of that stage was that we had to determine what exactly the needs of the market were, and that’s a lengthy process. I spent a good 12 months doing that. After that, it was pretty easy. We started assembling a really good team, and we have a good team of investors, so putting the company together was pretty simple.
In some regards, we’re very similar to some of our competitors; we take 12 open source Apache projects and then combine that with some of our own intellectual property to produce a full distribution for Hadoop. I think where we’re different is that we make some significant enhancements to Hadoop that transform it into a reliable compute and dependable data store, with just blazing performance.
We’ve made it easier to build Hadoop applications by opening up a much broader set of open API’s to use against the data that’s stored in Hadoop. Our position in the market is as a technology leader, with a combination of a huge component being done in open source, and then some of our own standards-based extensions.
One is that we heavily fund engineering; so being technology leaders, we want to make sure that we continue to push the envelope to provide the best platform in the market. We’ve also built out customer education support and professional services so that we can help customers early in their deployments with engagements, like use case identification, algorithm selection, and the initial application design. We can follow through with classroom training for their programming staff and development staff, as well as training for operations, and then staff 24/7 support to help them as they deploy in production. We basically guarantee their success and stay very close to customers to make sure that we continue and enhance both our product and services offerings.
The ideal MapR customer is any customer running big data analytics, especially those who have chosen Hadoop. At times we’ll talk about being in an enterprise distribution for Hadoop because enterprise kind of brings with it the connotation of reliability and dependability; but we also have large customers out there, such as The Rubicon Project and comScore, that wouldn’t traditionally be categorized as Enterprise. Anyone who runs Hadoop wants to have a reliable and dependable platform.
Certainly with regard to the cloud, we’ve made very big investments already. Over the last couple of months, we’ve made announcements where we are an OEM provider through Amazon Web Services, and we’re also the only Hadoop platform within the recently announced Google Compute Engine cloud service. We architected things into the product to make sure that it behaves well within a cloud environment, with regards to enhancements that support multi-tenancy, that allow the cloud to dynamically assign and reassign resources, as well as to operate in that environment without any disruption to an application.
If you look at where big data analytics is going, I think what we’re going to continue to push the envelope on opening up the platform to handle the broadest range of use cases. If you look at the initial architecture, it really provided a platform for batch, map/reduce processing. Now, through the efforts of the community and through the efforts of MapR, we’ve really seen a much broader API set made available.
Basically, we’re seeing the platform open to a much broader range of use cases by opening up many more standards-based interfaces. Then MapR added a POSIX compliant storage system, along with the ability to access Hadoop using file-based interfaces. We also added JBDC and OBDC drivers, so you can access data in Hadoop from all the standard database tools. One of the reasons Hadoop is so rapidly gaining in popularity is we’ve transformed it from being batch map/reduce only to handling more programming interfaces, and now we’re going to move it more toward real time. We think that’s where it’s going to gain the most value or provide the most value to the marketplace; and we feel that we’re going to be an organization that’s going to help push Hadoop in that direction.
What do you see as the major challenges in big data for your customers, and how can businesses overcome those challenges?
I think the major challenge is that they have to learn a new paradigm for big data analytics. We have a very large population of analysts that understand traditional data warehouse or relational technologies. We’re still in the learning curve for Hadoop, so organizations continue to build their expertise to understand how they can apply clustering algorithms, recommendation engines, and so on. One of the challenges over the next few years is just having that talent pool, which is already significantly large, but there’s just an insatiable demand for talent who can develop these big data analytics applications.
I think in the cloud space, the gorillas in the market are Amazon and Google. They’re just so far beyond the rest of the market in understanding how to build extremely scalable, high performance, reliable data center infrastructure. I think them bringing that to market, and making that available to broader organizations, is really significant.
The Hadoop eco-system is very interesting as well. It’s grown to be a large set of companies that provide technology and services to Hadoop customers. Not only Hadoop pure play companies, like MapR, but you also see larger companies like EMC, Oracle, IBM, Network Appliance, Cisco, HP and others. It builds comfort with customers that they’re going to have the broadest range of technology and services to select from in the future.
Outside of the Hadoop market that’s kind of in the big data space would probably be TenGen. I think they seem to be getting pretty dramatic traction in the Mongo space.
I think it’s the value we’re bringing to organizations that are deploying big data analytics. It’s really satisfying to see a customer come from just working on identifying the highest return use case they could implement, and then work with them on analyzing their data sources and their business needs, helping them put together the business plan for developing their first Hadoop application, and initially training their development staff and operations staff to run on a new platform. Then, you see how quickly they accelerate from there.
Almost every one of our customers has quickly moved on to their third and fourth application within six to nine months of their first application. It’s really exciting to see them gain value from the platform, and also to see the momentum build as you get through the initial learning curve; the return on the investment they make in that initial learning curve just seems to grow exponentially over even a 12-month period.
We’re also excited about a project that we just started and is now an Apache incubator project called Apache Drill. It’s going to add some real-time functionality to Hadoop, which is a much needed capability. Starting that as an Apache open source project was extremely important because the API layers into platforms like Hadoop are really core to adoption. You need to make the app layer and the API layer ubiquitous in the market, and give organizations that want to use that new API the reassurance that it’s going to be standards based implementation over a long period of time. I think we see that API layer for Hadoop being pretty standardized across the different distributions, and I think that’s good for Hadoop and good for companies using Hadoop.
Ready to find the best BI software solution for your company? Browse product reviews, top blog posts and premium content on our BI resource center page. To compare the leading business intelligence software, download and browse the Top 10 Business Intelligence Software report for free.
For more on MapR Technologies, visit their website at www.MapR.com.