Browse Business Software Categories

Close  
Manage blog history View all categories A-Z

Big Data Analytics

Quickly Access and Analyze Big Data with Infobright

Quickly Access and Analyze Big Data with Infobright

Data volume is growing exponentially, with many businesses struggling to keep pace with how to store and analyze the growing amount of critical business information. Infobright provides a scalable analytic database engine so that businesses can quickly and easily analyze machine-generated data without needing help from the IT department. President and CEO Don DeLoach shares why Infobright’s “cult-like focus” on machine-generated data can help organizations gain a comprehensive understanding of their data to optimize their networks and improve performance.

For more information about Infobright, visit their website and read our product review.

About the Company


WEBSITE: www.infobright.com

FOUNDED: 2005

LOCATION: Toronto, Canada

SOLUTIONS: Infobright Enterprise Edition

How was Infobright founded? What was the inspiration to establish it?

Infobright was founded in 2005 by four mathematicians, three in Warsaw and one in Toronto. They created Infobright originally as a consulting firm specializing in data analytics. They recognized that the work that they were doing could be productized and very shortly thereafter they built what they called the Bright House Engine, which is basically the Infobright product, and attracted venture funding and got off to a traditional start as a start-up. In 2008 the company made the decision to really go into the market with what was now a new product.

Infobright’s market is a subset of Big Data Analytics. Can you tell us more about the type of market Infobright serves?

We did some loss analysis, we talked to customers, and we looked really hard at the product to determine the emerging world of machine generated data. If you look at the overall umbrella of Big Data, you looked at what the two biggest contributors were to Big Data, that being unstructured data and machine generated data. The table structures and the overall data structures associated with machine generated data are just really well suited for what we do. So if you happen to be storing call data records or web logs, or network security events—all of this is extremely well suited to be leveraged using the Infobright architecture.

More and more we focused really strictly on a subset of Big Data, that being machine generated data. Our customers are solution providers in the advertising technologies space, online analytics, mobile analytics, the manipulation of sensor data, online gaming, things like that. It works out really well, and the main characteristics are exceptionally low cost of ownership, very aggressive disc compression and an extremely good query performance, especially if there’s a need for doing ad hoc work. We’ve been able to keep that focus and leverage it in a very exciting way.

How would you characterize Infobright’s mission?

I would like for Infobright to be viewed as an innovative leader or one of the premier providers of very specific solutions for storing and analyzing machine generated data. We have an almost cult-like focus on machine generated data.

Who are some of your competitors and what do you do differently?

I guess the main competitors we see out there will be companies like Sybase IQ and Vertica. For the things that we do differently, let me cast a broader net. Sybase IQ or Vertica or other columnar stores are fundamentally analytic databases. There are very clever solutions out there. When I look at a Sybase IQ or a Vertica, I don’t think they’re necessarily the right answer in all issues—in fact, quite the opposite. But it’s a different architecture. What they have done at one level is to employ a similar strategy to ours in terms of using an inverted file structure where you’re storing data in columns, which tends to service analytic needs very well. They do that and we do that, and there are definitely some similarities there.

The difference is when the use case is machine generated data or what’s less important is the origins and what’s more important is the structure. But when the structure is conducive to our architecture we’re able to establish the architecture and maintain it without the need for database administrators, establishing and maintaining indexes, continually tuning the database, having to anticipate the types of queries that need to be asked so that we have significant advantage when ad hoc queries begin. And so we are different than them in so much as if the use case is storing and analyzing sheet generated data, we can deliver a more aggressive cost ownership and an easier to maintain environment than what you would get with one of the alternatives.

Now the flip side to this is that the people we compete with are generally able to address a broader set of use cases—what I would characterize as more general data warehousing. So if you look at the customer base for Sybase IQ or Vertica, you tend to end up in many cases with a more general data warehousing environment that for example, might maintain hundreds of tables and utilize the data across a complex snowflake schema—and that really isn’t well suited for our architecture. Not only do we not focus on that, we explicitly will not engage in opportunities with that as a prerequisite because it’s just not the right use for our architecture.

Think of it this way: if analytics interfaces were a transportation modality, if they were cars of some sort, a general purpose car might be a four door sedan that is suitable for a variety of different use cases. But, if what I’m trying to do is transport a family of seven, then a general purpose sedan isn’t going to do it; you might need something like a Suburban. Conversely, if what I’m trying to do is get through traffic in New York or London at the height of traffic and I need to get into narrow streets and weave through traffic then a scooter or a motorcycle is going to be a much more acceptable alternative. That doesn’t make a motorcycle bad; it doesn’t make a Suburban bad; it just makes it more suitable for a specific use case.

Let’s take the case of a motorcycle. A motorcycle will cost less; it has limitations relative to a four door sedan or Suburban; it’s only going to transport one person; and it can’t carry a lot of cargo. But again, for the specific use case, even though it costs less, and it’s likely to be far easier to maintain, it is a more desirable solution. And, that is indeed the case with most of the use cases that we work with where storing and analyzing machine generated data is what our users are trying to address. In that regard we’re able to provide them with the ability to do that by spending far less money yet still get highly desirable results.

What challenges does machine generated data pose to businesses and how does Infobright alleviate those challenges?

The challenge would be that the volume of data is growing exponentially. There’s been an increase in the number of users of mobile devices and the number of instances where sensor data is deployed. So for example, it used to be that the type of sensors that would be in a phone would be fairly limited. Now there are sensors that can do barometric readings, which will show up in somebody’s smartphone, which makes everybody aware of you. So the increases in the amount and the sophistication of devices that auto generate data is growing at such a high level that it’s creating an enormous load on the systems that store and analyze this data.

For example, if the data is stored in Oracle, or if the data is stored in MySQL or on a server, it tends to be manageable up to a certain point after which the queries start to run very long, the disc space starts to grow very, very large, especially if performance is an issue where in a traditional database like that you have to keep indexing the database in order to maintain your previous performance.

So really, Infobright allows customers to get beyond those challenges and take advantage of the opportunity they have for getting a much more comprehensive understanding of what they’re dealing with by being able to for example say, “Hey, I happen to be an extremely large mobile carrier and I’m trying to optimize my networks. But, it would be highly desirable if I could look at seven days order call record instead of three days, and it would be great if my analysts could get the results back from their queries in fifteen seconds as opposed to thirty minutes.” And those are the types of almost game changing opportunities that can be leveraged with the right technology.

Big Data right now is a very hot topic, what do you think has facilitated this rise in Big Data? Where do you see it headed and where do you see Infobright fitting into that landscape?

There are two main drivers. One is the enormous increase in the number and in the amount of changes in data, and the other is the increase in the amount of unstructured data. That includes the digitization of photographs, video, and documents and the facilties that allow for the storage and socialization of these documents up the point where companies and organizations have so much more information to contend with and need to be able to analyze this information across a variety of structures.

So in days of old where you had an IBM mainframe computer and everything was basically in a highly structured environment, the people’s knowledge of and thought process about how they dealt with the organizational information was pretty much contained to what was going to be within those ITM files or BPM files or whatever. In today’s world, it’s everything. It’s images, it’s videos, it’s massive IT logs, it’s any combination of data, up to and including traditional records that are generated in traditional role-based databases as well. But, that’s becoming a smaller and smaller percentage of the overall ecosystem of information that we’re going to face and have to deal with.

In order to comprehend and leverage the power of that information, I think people are beginning to recognize that there are specialized tools that can be used to very cost effectively store and comprehend what’s within that data. So the challenges are really understanding that there’s no silver bullet and you want to use the right tool for the right environment.

For example, if I’m storing loads and loads of pictures, I might want to use NoSQL document store database like MongoDB for example. If I’m storing lots of machine generated data, I might want to use Infobright. In fact, I also might want to understand how these various technologies co-exist.

I think that there’s many technologies out there that are ideally suited for pieces of the overall Big Data equation, and the challenge is to be able to have the various component technologies working together to solve the right business problem. That doesn’t say that there’s any one right answer other than the right answer for a specific business challenge.

Where do you see the Big Data segment headed in five years? What do you think will be the next big trend within this segment?

I’m going to sound very biased when I say this, and perhaps I am, but one of the key characteristics of where I see the Big Data market going in the next five years is that machine generated data will play a bigger and bigger part. I think that will manifest itself in the most obvious way—and in essence there is a sea change, this will be it—and that we’re going to begin to see a huge shift in communications. If you look at the work of the WC3 or the Tim Berners-Lee Semantic Web Initiative, it’s all about creating an environment that enables a whole new set of capabilities that did not exist before. So certainly that is one of the aspects of the next big wave.

The other thing is I think that you’ll see tighter and tighter integrations so the abstraction of the technology will get greater and greater and I’m sure you will see a preponderance of appliances. If you look at the world of database appliances today, they are most assuredly general purpose in nature. I think that as the world evolves, you’re going to see much more specialized capabilities that are purpose built, much like what we’ve seen in the way of security networking where you have the B&F appliances or things like that.

I think that there’s a lot of sophistication and I think that the sophistication at the technology level will be abstracted to where there’s a  greater ease of use for any users to utilize this technology. I think that the real sea change will come in terms of machine to machine communications and again, underneath that all is a massive growth of machine generated data.

Other than Infobright, who do you think are the most interesting people or companies in your market segment right now and why?

There’s some incredible developments in terms of what’s going on with memory technology, so I think that that will affect how databases are architected in the future.

If you look at what people are doing with the innovation in sensor technology, there’s all kinds of coordination in technology. So, take the idea that you’ve got a Bluetooth phone. It’s been around for a while and is no great change. But what about the idea that you have sensors that you’re either imbedding inside your body or wearing on your body that are taking measurements of vital statistics and that understand where certain thresholds exist; and when you trip a threshold it activates a link to your cell phone via Bluetooth that’s been phoned into a central repository or a central monitoring environment that alerts a health organization that one of their patients has just tripped an alert? Or, ADT could monitor a house for burglary; only now it’s sophistication and technology combined with advance data communication that form this link.

So stuff like that I think is awesome. And I think it’s going to be way more widespread than anybody thinks and it’s going to happen way sooner than most people think.

What is next of Infobright?

We will not deviate one iota from our religious focus on machine generated data, but we will continue to introduce the ability to deal with greater and greater volumes of data, and utilize innovative and unique techniques for how that data is interrogated all in the name of better utilizing machine generated data.

Want to read more Business-Software.com exclusive interviews with CEOs and company founders? Head over to the Behind the Software Q&A section of the blog to browse the complete Behind the Software collection.

  • Consultant

Darlene Lin

Web Contributor, Quickly Access and Analyze Big Data with Infobright
Expert in CRM, Social, ITSM/IT Help Desk, and ERP
Darlene is a web contributor, specializing in the CRM, Social, IT Management and ERP segments. She has several years of experience covering the software technology world, writing about the latest companies and trends and interviewing the founders of emerging and ...
  • http://twitter.com/AMusnikow Alan Musnikow

    The lower right corner of the diagram http://www.business-software.com/wp-content/uploads/2012/08/Infobright-Diagram.png says, “Data Packs  Contain compressed 65K items.” If “K” meant 1,000 and one truncated the three columns to the right of the thousand column, this would be correct.
    However, since the number of values in a data pack is 65,536, which is exactly equal to 64K when K means 1,024, I find “65K” confusing.

    The sentence at the beginning of the next to last paragraph, “So stuff like I think is awesome.”, seems to be missing at least one word.

    • Darlene Lin

      Thanks for letting me know about the typo! Unfortunately, I can’t do anything about the image as it was provided by Infobright.

  • Jfinerfrock

    We are an Infobright Enterprise
    Edition customer.  What I am concerned about is the “religious focus
    on machine generated data” I actually think Don’s analogy of transportation is
    a good one.  I owned a motorcycle once.  It was great, I had a lot of good times, I
    went a lot of great places, and I met a lot of good friends; but then I grew
    up. I had family to think about my commute to work got longer and I had to
    think about the weather for my ride home (I hate riding in the rain and the
    cold) I began to use the bike less and less. 
    Then one day I was looking at it in the garage and realized I had not
    ridden it in almost two years.  It had
    become a hanger for beach towels and a collector of dust. My sign of youth and independence
    was now just a youthful indulgence; a forgotten toy.  What I drive now is a crossover it is
    comfortable, stylish, it is fast and I can even carry lumber in it. Do I miss
    my bike…not really, I just miss my youth. 

    Someday Infobright is going to
    realize that while this idealistic laser focus is a great differentiator now,
    one of their competitors is figuring out how to continue to do what they do
    well and get to 80% of what Infobright can do. 
    At that point organizations are going to ask themselves why build this “new
    thing” on Infobright? Why not consolidate my work, lessen my annual software subscription
    costs, and reduce my multiple, mostly redundant, systems.  I believe that unless they consider other use
    cases, eventually Infobright will be a collector of dust, a badge of honor to
    tell the new young guns “we were innovative once.  We were leading edge once; now… we consolidate
    our workload and get home for little league practice in our crossover”
     

    • Don Deloach

      Thanks for the comments Jon. I have to smile about the reference to missing your youth. I think I feel that more and more. However, the considerations for our approach at Infobright are guided more by both the trends within technology, and in particular, the big data space, as well as the overwhelming statistics regarding the growth in machine generated data. We are focused on this segment of the market because it is a massive and growing market. Everything from the growth and sophistication in mobile devices to the expanding use of increasingly specialized sensors underscore this. The smart grid industry is currently dwarfed by mobile and online analytics which are being fueled more and more by social media, gaming, and video transmissions over mobile networks. And yet the smart grid industry is probably a great glimpse into things to come. It requires two way communications, lots of data, and no humans. Is is expected to reach $171B market in 2014, and there are at least 23 specific types of specialized sensors driving the vast mountain of machine generated data that enables the market. And that’s just one example. If you look at the emerging world of machine to machine communications, there are certainly more. 

      As far as the database market itself, it continues to specialize. We are now seeing exciting new entrants into the market like some of the NoSQL offerings such as MongoDB, Hadoop, Citrus Leaf, and many many others. To continue the analogy, these may not be 4-door sedans either. They could represent SUV’s, 2-seater convertibles, or even ski boats or planes. In a world that is becoming increasingly specialized, we feel there is no real silver bullet that will solve all issues. The imperative is for co-existence, and we take that very seriously and do that very well. 

      So our focus is not a function of providing an offering for a limited number of people to enjoy their youth, it is a response to a global market where more and more people need motorcycles, and where the increase in demand is paired with a desire for more efficient, more cost effective motorcycle options. In other words, it is a business decision based on thorough analysis backed up with a wealth of empirical data. And in the past two years, it seems to be paying off extremely well. 

      I have said a thousand times, “we do not try to be all things to all people”. Not every software vendor takes that approach, and that is certainly fair. But our focus allows us to guide our roadmap, our hiring, and our investment in resources around being the best we can be for what we see as a great market for a long, long time. 

      Best – Don