The distinction between “internet” and “brick and mortar” companies continues to blur as more and more companies operate online. As a result, the need for responsive databases that can handle semi-structured data quickly is greater than ever. For several years now NoSQL databases have been the go-to technology for managing that kind of data.
For this Q&A with Couchbase, we sat down with Bob Wiederhold, president and CEO, to talk about the technology behind NoSQL and how his company is harnessing NoSQL to provide resilient document-oriented databases for a variety of major organizations.
LOCATION: Mountain View, CA
CUSTOMERS: Adobe, ADP, Intuit, LinkedIn, Salesforce.com, Zynga
The company’s history is interesting. Initially it was founded under the name of Northscale and it was going to be the commercial company behind Memcached. And Memached was a rapidly proliferating caching technology for many of the biggest websites around the world, so the company started out with that in mind. Very quickly it got feedback from a number of customers, or potential customers. That was interesting.
But what was more interesting was that they were looking for a replacement for a combination of Memached and MySQL in cases where MySQL was being used as a value database. And a number of potential customers said, “Wow, if you could do a combination of those two things, which in essence makes it a key value database with a built-in caching tier, that kind of a product that would have value.” And so the company shifted gears a little bit and started developing that key value database with a caching tier built in.
At about the same time the NoSQL category was starting to get more and more attention. It wasn’t necessarily that the company wanted to be a NoSQL company, but that’s kind of what we were. We were a key value database that fit into the NoSQL category and we were off and running in this space.
There was another company called CouchOne, they were the company behind CouchDB and CouchDB was quite successful in its own right. It was a document database and at its core we basically said, “Hey, if we get these two companies together we think we’ll have a much better chance of ultimately being the dominant player in this space, so why don’t we cast our lot together?” So that’s what we did.
In February 2011 we officially announced it, and this past December we got our document database capabilities out in the 2.0 release. We continue to grow very rapidly and we are generally recognized as one of the three leaders in the NoSQL space. That space is growing very, very quickly.
It starts out with the data model. You can think of a relational database often times as hundreds, even thousands of interrelated tables with rows and columns. If you want to store, for example, user information, and you might have one table which is first name and last name, it’s going to point to another table which is a table of towns and states and just zip code. Obviously there’s more than one person in that particular town, state and zip code, so every user who has that same town and state and zip code would all point to the same row on the address table.
In that sense a relational database is very good at handling very structured information that can very easily be put into the rows and columns of the table.
By contrast, a document database puts that information into a document. So to continue with this example, you would basically take a single row that cuts through all of those tables and you would put that into a single document. So now that document would look something like the following: first name; Bob, last name; Wiederhold, street number; 107, town; et cetera. And then it would be pet type; dog, cat, parrot. All of that information would be captured in a document that has a format, and in our case, it’s a JSON format and it captures all that information in a document.
And a key thing about a document database is that it handles both under-structured and semi-structured information very well, so you can put all kinds of text in it. You can have all kinds of complex nesting within the document and so these are all capabilities that you really can’t—aren’t supported in a relational database.
A second key difference is just in how these databases scale. Relational databases are centralized scale-up technologies. So if you have more data or there’s more data operations that you have to serve, you buy a bigger and bigger machine with more CPUs, more memory and more shared storage, et cetera. And then when you buy the biggest machine possible then you split up your database into two databases. One for east of the Mississippi, one for west of the Mississippi, and then you have to change your application so that it knows where to find the data.
And so as you scale more and more, that kind of a scale-up technology becomes very problematic and with a noSQL database like Couchbase, it is a scale-out technology and it’s a share-nothing technology. So what that means is you have one database but that database is distributed across five nodes, ten nodes, a hundred nodes of commodity servers. And in a distributed database the database itself manages how it’s distributes the documents across node servers and it becomes much, much easier to scale.
Yeah, I mean, it’s really those three things; scalability, performance, and ease of development. If you just look at the history of noSQL the internet companies that were operating at incredible scale ran into these problems first.
I think most people traced the beginnings of the noSQL industry to when Google published its BigTable paper, which I believe was in 2007 or late 2006. Obviously, this is a company that many people looked up to from a technology perspective, and because they were operating at such huge scale they developed BigTable to help address some of these scalability and performance and ease of development problems.
That was followed up not too long after that by Facebook publishing their Cassandra paper and offering Cassandra as open source. After that you add Linkedin. They published a paper on Dynamo and so you have some of these rock star companies operating at huge scale and facing these problems. Based on some of the work that they did you started seeing other companies either picking up those open source projects or developing technologies that addressed the same kinds of problems.
I gave you a little bit of history of Couchbase. We, as I said, started to focus on Memcached but very quickly got pulled into this noSQL world of scalability and performance and user development problems, and we felt that we had the expertise. We thought it was an exciting problem to solve and we felt like there were lots of companies that were going to face this problem that already faced it and were increasingly going to face it and as a result we were off and running.
It’s pretty much across the board now, but now you’ve got a little bit of perspective. We broadly break up our customers into internet companies and enterprises, and we mean very simple things by those categories.
By internet companies we mean companies whose entire business revolves around the internet. By enterprises we just mean brick and mortar companies. Obviously brick and mortar companies increasingly rely on internet and mobile applications as an underpinning for the business. So in essence they’re doing the same kinds of things as the internet companies, they just also have a brick and mortar business. By that definition the enterprises tend to adopt new technology a little bit slower than the internet companies.
And so within those categories for internet companies, we’re very big in social and mobile gaming. We’re very big in advertising platforms. We’re very big in business-to-business SaaS-type applications like Salesforce.com. We’re big in the travel industry so Orbitz, for example, is a big customer. Social networking companies, everything from these mobile chat applications to various social networking companies like LinkedIn. And it goes on and on.
We’re learning all the time and I’d say it has more to do with the things those clients prioritize. As an example, early on in 2010 we did not realize how important something like cross-data center replication is to particularly big customers.
Cross-data center replication basically allows you to mirror your database across multiple data centers around the world. By doing that when you access their applications in Europe you’ll access the European data center. If you’re on the West Coast you’ll access the West Coast data center. That reduces the latency and as a result increases performance. It also gives you redundancy in case one data center goes out, et cetera.
We didn’t realize initially how important that feature was going to be, and we got the message. So we delivered that in our release this past December and it’s just been huge for us. I mean, there are just so many companies who really love the NoSQL technology, but until they had cross-data center replication they couldn’t use it because they had to be able to operate in multiple data centers.
I think we’ve got a pretty good handle on the wide spectrum of features that people are interested, but the noSQL industry is relatively young. There’s tons of additional features going into all of our products on a quarter-to-quarter basis but it’s very important to try and figure out which are the most important features and what the priority in delivering them should be. That’s a very important part of whatever success you achieve.
From a development perspective the biggest thing–which is by no means unique to Couchbase—is probably just the distributed nature of development. So since we’re open source, people can contribute from wherever they are in the world.
Obviously, though, this is very complex software. It’s not like just adding a little feature to an application. This is very complex stuff where the different capabilities of the product are all interacting with one another. We make sure that we’re working closely with the open source developers who are out there. They understand the architecture of the product. They can work with our engineers to make sure that some features that they want to add isn’t just a feature for a very narrowly defined use case, but instead implemented in a general purpose way that can be used by a broader spectrum of our customers.
So again, I don’t think that that’s unique to us, but certainly managing that as a part of an open source project—it’s always tricky. We don’t have that many third party developers who contribute to the core of our product because it’s so complex. But we do have lots of people that contribute to a lot of the peripheral capabilities to our software development kits (SDK) and various connectors and things like that and that’s always very helpful. But you need to manage it and QA the software to keep providing a product that people can count on working. There’s a lot of work involved in managing all that.
Well, from a business perspective the company is about 120 people. We see noSQL growing very fast. When you’re in that kind of a situation one of the biggest challenges is just growing your distribution channel fast enough to be able to go after that opportunity to full capacity. That’s absolutely happening to us now.
We have so many accounts that are doing deep evaluations of noSQL that include Couchbase. We just have to hire the salespeople and sales engineers and the technical support people necessary to both win those evaluations, close the deals and also make sure that the customers are happy after they’ve purchased.
We’re in a very rapid growth phase and this is my fifth start up and, happily, everyone of them have been successful and gone through this kind of a stage, but it’s a very stressful, very challenging exercise to go through and we’re right in the middle of it right now.
On the product side, probably the one area that we’re focused on very heavily is continuing to provide more really key capabilities for mobile developers. Obviously, mobile is a huge phenomenon, a mega trend. We think that there will be more and more capability in a mobile device build. You’re going to go from two to four to eight cores in a mobile device and you’re going to have much more memory, so people are going to figure out ways to leverage these more powerful devices. We’ve got some projects under way that are focused on that.
Other key things are just continuing to develop features that are necessary for the enterprise. For example, one of the things enterprises really need is security features, which aren’t as critical to an internet company. Once you’re talking to financial services companies and health care companies, however, you’ve got much more stringent security requirements. So there’s a big push in putting those kinds of features into the product.
On the development side the big thing is that it’s a very competitive market. We need to be highly efficient and get the right features out as fast as possible. We have the strategy to not only solidify our dominant position in certain segments of the market, but also expand our beachhead to more and more use cases and more and more verticals where we can provide a great solution and ultimately be the big winner in this space.
I think the big thing is that database industry is being disrupted. It’s being disrupted by noSQL companies like Couchbase who provide operational databases, it’s being disrupted by analytic databases like Hadoop, and it’s being disrupted by other technologies.
So where is this market going? I think you’re going to see a massive change in the database technologies that companies use and that’s because today the vast majority of applications are cloud-based applications. That will be even more so over the next three, five, ten years. Applications are data centric. Most of the features of applications are increasingly based on data that’s being collected and processed. And applications increasingly have to support huge numbers of simultaneous users because they’re available over the internet and a single application has to support huge numbers of users.
Those three things; cloud based, data-centric applications, huge numbers of users, those are things that drive database requirements that are very much aligned with noSQL and technologies like Hadoop. And so what we think that there’s just a massive shift taking place.
There’s been a lot written recently. Oracle has missed their numbers the last two quarters and for the first time the fact that their business could be in the midst of disruption has gotten a lot of attention. Obviously, Oracle is not going to topple over in the next 30 days, but they’re starting to see some fairly obvious signs that this disruption is really happening.
I think you’re going to see a lot more of that. It’s going to happen to Oracle. It’s going to happen to IBM. It’s going to happen to Microsoft. It’s going to happen to all of the current database players, all of whom built their businesses on top of relational technologies.
Learn more about Couchbase at www.couchbase.com.