Big Data Big Insight


Mike Pilcher

Extreme Data

Last week I met with the team at Gartner and we discussed the concept of “Extreme Data”. Gartner states there are three dimensions to Extreme Data — volume, velocity, and variety, — and these dimensions are critical when it comes to winning the battle for control of enterprise data amid the conflicting requirements of business intelligence and reporting vs. data mining and statistical analysis.

How does Extreme Data differ from the past concept of “Big Data”? Big Data was mostly used to describe size or data weight, or what’s now being equated to the “volume” dimension of Extreme Data. I agree with Gartner about the importance of this dimension, and that it is not all we have to consider.

We spent a lot of time working on data volume at SAND. It formed the core of our Nearline ILM extensions which focused on helping enterprises manage their data volumes. This was relatively easy to do in the past, as we only needed to worry about filling that bucket with one type of data: tabular data from operational systems.

Now the plot thickens.

Today volume is complicated by variety. Enterprises trying to understand the patterns in their business can barely get tabular data from their online transaction processing systems into and out of their data warehouses in a timely manner even when all they need to handle are static reports and business intelligence. Any guesses as to what happens when they try to add data from weblogs, applications logs, business control systems, RFID, and on and on, with new types seemingly appearing every day?

The one SAND is about to address is Social Media data. How do you manage your brand, promote your products, address negative sentiment, and act on emerging trends on Twitter, Facebook, YouTube, or the blogosphere without being able to mine the data? How do you grow business and beat the competition without taking this Social Media data and putting it into a format where it can be structured and analysed, and used to make relevant business decisions?

To do all this, however, we need to address variety as well as volume, and the resulting data explosion will make the one we’ve seen over the last few years seem more like a pop by comparison. The challenge there won’t be about storing the data — you can always get a bigger bucket — but about effectively getting data into and out of the store. Duct-taping 70 1-litre engines to a lawnmower won’t make it go very fast. Likewise, throwing old technology at these new data problems won’t give you results. You need performance… and that leads us to velocity.

Velocity is how rapidly an enterprise can move its data volume from source to user community.

Data first needs to be loaded, then users need to get access to it. Not all users have the same velocity requirements. Users doing advanced analytics — data mining to understand market basket behavior, pattern-seeking for cross-selling opportunities, online processing for financial analytics — need far more velocity than users doing simple BI and reporting. They need extreme velocity.

In a world of location- and context-aware devices with persistent connectivity — iPhone to iPad, Blackberry to Galaxy Tab, — extreme velocity is a crucial prerequisite for loading and merging real-time feeds, and getting the required performance for real-time, advanced analytics.

Embedded analytics — pushing specialized functions into the database itself — is a way to address this and it’s something SAND has been doing for a long time.

Volume, variety, and velocity prove there is more to modern data management than “big data”, and for enterprises with advanced analytic requirements, there’s much more.

(See our SAND Analytic Database Performance white paper.)

Gartner speaks with verisimilitude to voice that the villains of Extreme Data are the virulent V’s of volume, variety and velocity. These venal villains have voided the valiant efforts of organizations with voracious appetites for data, preventing them from giving users the vital access they need. With 20 years of vigilance, SAND’s vision to vanquish the vexatious vermin of volume, variety, and velocity has not been in vain. We stand vindicated and victorious. We shall not veer from our course and, while never vainglorious, SAND shall stand at the virtuous vanguard and bring victory over Extreme Data to our very valued customers.

The views, opinions, positions and/or strategies expressed by the authors are personal and theirs alone, and do not necessarily reflect the views, opinions, positions and/or strategies of SAND Technology.

3 Responses to “Extreme Data”

  1. David O'Berry says:

    LOL at the Vale of the Vole impersonation!

    Good info.

    –David

  2. [...] Data Quality This blog post is inspired by reading a blog post called Extreme Data by Mike Pilcher. Mike is COO at SAND, a leading provider of columnar database [...]

  3. [...] day new varieties of Extreme Data deluge the enterprise. It is no longer sufficient to simply focus on tabular data coming out of [...]

Leave a Reply