Big Data Big Insight


Mike Pilcher

Thoughts on Cloudera Hadoop and Netezza

Mike Olson recently said the following while announcing a partnership between Cloudera and Netezza:

“Enterprises want to take structured data – customer and transaction data – and combine it will [sic] all the unstructured data coming off their websites…that might not fit into a tabular schema well.” [...] “All of that activity is captured in web logs that can’t easily be digested using existing relational systems.”

I couldn’t agree more with the premise and I’m glad Mike is prepared to mention the elephant in the room. Given who he works for and given that Netezza is a relational database in a very fast box, that would seem fair.

I think of this as lawnmowers and duct tape. If you strap enough lawnmowers together they will go pretty quick. They won’t handle very well, the safety record’s spotty, they burn a lot of two-stroke, and keep running out of gas, but still they go fast in a straight line.

Again Mike is right when he says customers want to be able to merge this data with other data types such as customer and transaction data. His conclusion, however, is just plain wrong.

The answer isn’t to process some data in the cloud and then some in an appliance. The answer is to put them all in a common store that can process both.

Hadoop is a great technology. Cloudera is doing a great job and adds a lot of value. But the particular problem Mike refers to would be much better handled by putting all the data in a single Column-Oriented Database Management System (CDBMS).

Why push your data into the cloud, analyze some of it, pass some of it down into an appliance, analyze some more of it, and then start all over again? Even putting aside questions of bandwidth, security, and performance, it’s simply not efficient. You can pound screws with a hammer and wrench rather than using a screwdriver, but why would you?

Given the costs involved and results demanded, enterprises are looking to use the right technology for the right job. The right technology for this job efficiently combines all the data in one place and is optimized for performance.

The right technology is SAND CDBMS.

[The Register]

The views, opinions, positions and/or strategies expressed by the authors are personal and theirs alone, and do not necessarily reflect the views, opinions, positions and/or strategies of SAND Technology.

2 Responses to “Thoughts on Cloudera Hadoop and Netezza”

  1. Jeff says:

    Hey Mike,

    Cloudera software is most often deployed inside of a customer’s data center, not in the cloud. It’s also the case that Hadoop does not require a schema to be specified for data when it lands in HDFS, so enterprises are able to catch data from all kinds of data sources before they are integrated into the enterprise data model. Once the data has been properly conditioned, coded, enriched, and modeled, I agree that loading it into a CDBMS could be of interest.

    Regards, Jeff

  2. Mike Pilcher says:

    Jeff, great to hear from you. I think we have technology that is complementary. From what I can see the example that was provided offered an opportunity for SAND to highlight that web logs, applications logs etc are best served in a CDBMS. The ability to pull together all the multiple data types from diverse locations is Cloudera’s strong point. Like Cloudera, SAND deploys inter-cloud and intra-cloud in private clouds. Our belief is for the data analytics at scale customers will want the security and scalability the private cloud provides and that is where we see most enterprises going. Mike

Leave a Reply