Businesses today have multiple software applications and vast quantities of company data, both of which require ample repository space. Software-defined storage vendor Inktank is helping businesses alleviate that burden with a software storage system based on the open-source project Ceph, which provides a more flexible, cluster-based software storage alternative to your standard hardware storage. We spoke with Ross Turk, VP of Community and Marketing, during our Q&A with Inktank about the company’s unique subscription offering, the benefits gained from building on Ceph and its plans to transform the software storage industry.
LOCATION: Los Angeles, CA
CUSTOMERS: Cisco, Deutsche Telekom, CERN, Drexel University
The amount of data that we’re storing is accelerating, and we’re experiencing a data storage shortfall. There’s a huge gap when we look at the projection of how much data we’re all going to need to store versus how today’s technology is going to be able to facilitate that storage. This is because the types of solutions being created today for storage are big, inflexible appliances that are very expensive, hard to manage and difficult to scale. What the industry needs is scalable, flexible and cost-effective storage.
What Inktank does that is different from other companies is provide a software-based clustered storage solution. Instead of going out and buying a very large hardware appliance and paying a lot of money for it, you can build the same thing using open source software on top of everyday commodity hardware. Instead of paying a premium for an integrated product, we allow enterprises to use everyday computers and open source software to build something that is very much the same to the end user, but more flexible to manage.
Ceph allows you to build very large clusters that provide storage. If I have 100 machines and each machine has storage resources attached to it, Ceph can take all of those storage resources and combine them in such a way that they can be managed and accessed all at once by the end user. When people interface with Ceph, they’re interfacing with a giant storage cluster that can grow and shrink in an elastic way. This is opposed to a large storage appliance which would require you to go out and buy a new one once it reaches capacity.
Because Ceph is clustered, distributed storage, you get this elasticity. That makes it really easy to integrate Ceph with a large variety of business applications, because it becomes a software problem. You don’t have to rely on the hardware company to provide an additional interface to support a particular application. It’s software, so it can be written much more flexibly.
Say you’re running a website and you have so much traffic that you can’t serve it with one web server; you need to have two web servers. There’s the dilemma: how do I take one workload and split it into two servers? It’s the same dilemma with storage: if I have so much need for storage or storage performance that I can’t get it out of one machine, I need to expand it across multiple machines. That becomes a software problem, orchestrating where the data lives and how it replicates across multiple machines. How the cluster grows, shrinks, repairs itself and recovers from failure are really Ceph’s sweet spots.
Inktank Ceph Enterprise is essentially a collection of the best parts of Ceph. Ceph is a really broad open source ecosystem with a lot of features and a lot of connectors to other technologies. We take the most stable pieces of the Ceph architecture, subject them to additional testing and quality assurance, and then add them to extended tooling and a management console to make it an enterprise experience. Ceph is very powerful and very flexible, but it’s not what an enterprise needs right out of the box. Inktank Ceph Enterprise bridges that gap; it adds the types of tools that an enterprise storage administrator will need.
High tech electronics, consumer electronics, green tech—those types of companies are right up our alley. We also have a lot of medical device companies that utilize Arena to develop their products and also to help them get through the FDA approval process.
Our ideal customer is somebody who is using Ceph alongside another technology like OpenStack. OpenStack is actually a really good fit for Ceph, because administrators who are familiar with OpenStack will also find Ceph very familiar since it’s a similar type of technology. I think our ideal customers are those who build excellence around operations, like service providers and very large enterprises looking to build new private cloud types of deployments.
We tend to find that Ceph is brought into an organization with a next-generation storage need like backup and archival or a private cloud deployment, and then it tends to spread to other use cases. I would say our ideal customer is one who is interested in the next generation of storage technology, which tends to be large strategic companies and small and medium business who have excellency around operations.
I wouldn’t say it has, although with Inktank Ceph Enterprise (which has only been on the market for five months) we’re definitely moving toward enterprises that are in need of more traditional storage solutions. These small and medium businesses who are used to deploying a solution, configuring with a web browser and using it for a scale-out-and-add type of workload.
As Inktank Ceph Enterprise has more time in the market and as we’re able to put more of our expertise into the product, I would say ICE is going to be more and more appropriate for a wider variety of customers.
Well, first I would say that Ceph is a 10-year-old open source project, so it’s been under development for a very long time and as a result has become really robust. It’s a distributed storage solution with object, block and file interfaces, and not a lot of solutions have that. A lot of them are object solutions, distributed file systems or distributed block devices. Ceph does all three, which I think is very interesting. Also, as a result of it having been an open source project for so long, there are a lot of integrations that exist today that have been built by the community in the last 10 years. I think it’s a very well-connected piece of technology that’s very embedded in the open source community, which makes the integrations powerful.
In addition to that, I would say there are a few things about Ceph that make it more manageable than a lot of other solutions. For example, the CRUSH algorithm is the way that Ceph decides how to place data into the clusters. It’s kind of a complicated algorithm, but it does a very simple thing: it calculates where the data lives inside the cluster, and it does it based on a series of policies. If you’re a storage administrator, you educate Ceph about the topology of your storage cluster and then you tell it how to make good decisions about data placement. It’s a very intelligent policy-based mechanism for defining failure domains across very large clusters. Ceph has a lot of self-managing and self-healing features and a lot of flexibility that you don’t see in any other solutions today.
Inktank Ceph Enterprise is based upon Ceph, so all of the benefits of Ceph are in Inktank Ceph Enterprise: the flexibility, the self-managing feature, the self-healing feature, the lowest cost per gig possible. All of those things [make ICE stand out from other software-defined storage platforms].
I think it’s always a challenge building a product based on open source projects, because you have to manage the balance of how strongly to commercialize versus how strongly to feed the community. It has been a huge challenge for us to make sure we run the project in such a way that it results in strong enterprise technologies, but also in such a way that it’s open to a community of developers. The key is always remembering that it wasn’t just us who built Ceph, it was a community of people, and as we commercialize it, we have to keep the community growing.
The funding has led to a lot of development and made a huge difference in the Ceph project. We’ve put significant resources into developing Ceph, and other people in the community have done the same. We’ve been able to add a lot of features into Ceph and then into Inktank Ceph Enterprise, which wouldn’t have been possible without first solidifying the object and block interfaces and making a good, tight integration with OpenStack. It’s made a huge difference that the company has been able to continue the development of Ceph, to bring it to the market and to establish a brand around Inktank.
It’s really good to see that the community is supportive of the work we do with Ceph. Ceph has always been a community effort, but since Inktank has been a company we’ve been able to really scale that effort up and attract a lot of attention, which has then turned into contributions.
Absolutely. If you’re interested, you can always look at videos from the previous Ceph Developer Summit, which occurs every three months. During the Summit we get together and discuss what’s going to happen in the next major release of Ceph, which then becomes part of Inktank Ceph Enterprise after it’s gone through Inktank’s additional testing and hardening.
The discussion from two Ceph Developer Summits ago led to something that’s coming out next month, which is erasure coding. That’s the ability to have an erasure-coded storage backend as opposed to the standard replicated storage backend. That means that instead of creating copies of the data to ensure durability, it uses an algorithm that can replace the missing pieces upon recovery. The result is that you can store more data with the same durability using fewer drives. That function is coming out very soon in Ceph, and will be in Inktank Ceph Enterprise later on this year.
Another thing is cache tiering, which is the ability to have multiple Ceph storage pools – one that is a backing pool and then a caching pool in front of it. You can have a fast pool and a slow pool of storage and move data back and forth depending on demand. Having a write-back cache pool on flash is another feature coming out that is really huge for use cases where you need a large amount of inexpensive storage that needs to be very fast.
What I anticipate seeing in the near future is people understanding that storage is increasingly a software concern and not a hardware concern. I think the technology will continue to mature, but what also has to mature is the understanding in the industry. Storage administrators need to understand how to deal with software solutions instead of hardware solutions. At the same time, these software-defined storage technologies are maturing and becoming ready for the midmarket enterprises, enterprises will have to build DevOps expertise in their storage teams.
Looking for more information on cloud storage software? Browse exclusive Business-Software.com resources on cloud management and storage by visiting the cloud management research center page.