HPCC Systems: A New Approach to Open-Source Big Data
LexisNexis Risk Solutions is a big data veteran, and their HPCC platform – High Performance Computing Cluster – is a direct competitor of Apache Hadoop. HPCC has a handful of distinct differentiators that make it a prime candidate for big data domination, including maturity, an easier programming language, and a real-time data analytics engine called Roxie. We spoke with Flavio Villanustre, Vice President of LexisNexis Risk Solutions, on their strategic advances and what the future of big data will look like.
CUSTOMERS: Customers in over 100 countries worldwide
LOCATION: Washington, D.C.
Would you provide a quick overview of your company and what you do?
We do a few things, but I will first tell you what we don’t do. We don’t sell cars, so if you have a Lexus, that’s not us. We have been in the information business for many, many years. We are two companies with the same name: LexisNexis and LexisNexis Risk Solutions. The latter company is in the risk management business, with customers in many different markets. We provide a number of data services and data solutions in each market, and then we also have the HPCC big data platform. HPCC Systems is a division of LexisNexis. It was born when we decided to push our platform onto the market, and so we created a new image and a new name.
Speaking of big data and HPCC: What is your experience with big data specifically?
We were doing big data even before people called it big data. As a matter of fact, if you look at the first reports from Gartner on big data, where they describe the problems around velocity, volume, and complexity, that’s what essentially made us come up with the idea for a new platform. This was back in the late ’90s. With big data, conventional tools just don’t work for what we need. Big data is data that comes in every hour, every day, every week. Say you get over 40,000 different kinda of data coming in from various external sources and formats, and some of it may be incomplete, it would take a large number of employees to input that data and extract the information within.
Why did you open source your data-intensive supercomputer, HPCC Systems?
We always considered this as the core competitive idea. We knew this was the one thing that would make us extremely competitive in the market. Obviously, we were always very protective of our technology, and over the years, we built and perfected the platform to differentiate ourselves even more from the rest of the market. But while we were protective of the platform, we also knew that the best thing we could do to make the platform more pervasive in the market was to release it on an open-source license. This essentially removes almost every obstacle that companies could have for using the. Despite the open source licensing, we do of course still provide corporate support and additional functionality on interim-type licenses for companies who want to pay for extra capabilities.
Would you elaborate on what sets your software apart from other solutions in the marketplace today?
There are many different aspects that make it quite unique. On the one hand, it is a platform that has been used in production for many, many years in big data processing. We’ve been doing this, as I said before, for over 15 years, and we’ve been very successful. We have about $1.5 billion in revenue. Our platform is mature and production-ready. It provides two components: one is a back-end data analytics and transformation engine, the other is a front-end real-time delivery engine called Roxie. Roxie provides real-time, concurrent, and highly available transactions. These two components are highly integrated. So the data life cycle consists of moving between one component, where the data is ingested and processed and analyzed and prepared for delivery, and the delivery system. I’m quite sure there is nothing like this in the market today. Systems like Hadoop provide back-end processing, but the production threshold is much higher because of the difficulty of coding. It also doesn’t have a complex delivery engine like Roxie. With Hadoop you’re forced to go with a third-party solution like Cassandra.
If you look at the Hadoop, it does have some of the features that Roxie has, just not all of them. Roxie is much more sophisticated. Hadoop, for instance, doesn’t have any real modernization in their linking on the back-end. There are some solutions out there that cover some of the back-end processing and front-end analytics we do, but they don’t cover the entire spectrum, which means they don’t support the entire data cycle, plus they are expensive. So HPCC Systems serves a very big market.
Who are your customers?
We have multiple customers in different verticals. Unfortunately, I don’t have permission to release the names of these customers. I can give you an idea of some of the various markets we’re active in, though. Finance and health care, as you can imagine. Government, intelligence, law enforcement, and retail.
Could you give some examples of how these customers actually use your software?
There are simple uses cases, like a client who uses the platform to provide a 360-degree view of their customer base. And then there are more complex usages. For instance, HPCC Systems is used for supply chain management, marketing metrics and KPOs, or to support analytic recommendation systems, market analytics, predictive systems, and more.
Where do you feel big data is headed? Where do you see it in, for instance, five years?
That’s a question that requires a long answer, but I’ll try to condense it. Big data is positioned as reaching its peak in the next two to five years. So I think that in five years, you will see big data as a well-established need across many different companies, potentially the majority of them. This can be because of the amount of internal data they generate, or because of the amount of external data that they need to ingest like social media feeds. I think you should expect to see big data as just another IT task in the market. Although maybe it won’t be IT who will take big data under its wing.
Companies in two to five years will understand the need for big data, the limitations, and the true cost – the value equation for big data. Today, the whole thing is hyped, but in the next few years, we’ll see it mature and be included as a standard business requirement.
Who are the most interesting people or companies in your market right now and why?
I tend to relate more to people, or even topics and targets in the market, than companies, because you never know what a company going to be doing two or three years from now. I would say that some of the key trends I’m seeing in the market are in predictive analytics. This started over a decade ago, but now it’s consolidating big data sources as well. In general, I’m keeping my eye on platform services and all the applications and solutions that are coming out in the analytical field as vendors are continuing to tackle big data.Tags: LexisNexis