Moving Beyond the Data Warehouse Impasse

Part 1

Over the last decade or so, data warehouses have expanded dramatically -- both in size and in importance to the organizations that deploy them. The data warehouse has proved to be an invaluable means for transforming data created by production applications into information that is technically, structurally and organizationally ready for use by business managers and domain analysts. Yet like every successful technological advance, the data warehouse has its inherent limitations – and these are becoming more and more troublesome as the combined pressures of data volumes and user demands increase.

Historical Perspective

Fifteen years ago, when Bill Inmon and Ralph Kimball were laying the ground rules for the data warehouse, they faced an interesting problem. The data they had to work with was transactional data, and the existing infrastructure was designed and optimized for transaction processing. So they made do with what they had – they pioneered – and helped us understand that with clever dimensional modeling, the right schema, and a lot of indexing you could make a relational database handle analytic tasks reasonably well. And when the typical warehouse consisted of a hundred gigabytes of data, servicing a handful of reporting users, adding 200-300 gigabytes of indexes to make it work was not a serious problem.

Data Warehouses Today

Nowadays, data warehouse managers are dealing with quickly multiplying terabytes of data and struggling to support thousands of users who want to run everything from standard reports to predictive modeling, along with all the varieties of parameterized and ad hoc queries that come in between. In this situation, indexes are not just enablers – they are also an impediment, effectively restricting availability and usage to just the subset of data that has been indexed to meet very specific needs.

For this reason, data warehouses deliver less value when used for tasks involving data exploration, complex analysis or development of a long-term perspective on the business. Neither can they cope easily with changing analytical requirements such as might result from M&A's or unforeseen alterations in the market. The standard data warehouse is particularly limited when provisioning data for new projects, because the data is no longer the "data store of record” but a collection of data that has already been processed (through preaggregation and so on) for a different set of tasks and analyses.

Risky Business

Another extremely important consideration relating to the question of security and risk in your business. In order to accommodate the huge volumes of data produced by today's businesses while still maintaining performance, the data normally needs to be prepared for analytical use by being put through various transformations. And, due to constraints on the amount of data that can be kept and managed, the original detail is often discarded.

But there is no guarantee that the changes made to the data will always reflect the "truth" of the original underlying data. In fact, it can be argued that any transformation of this sort introduces a bias because it reflects someone's idea of what is important or necessary for analysis at a given point in time. Furthermore, when it is susceptible to such alteration, there is always the possibility that someone can, for whatever reason, tinker with the data –.and in today's legal climate, we are well aware that this could lead to be very serious repercussions within the organization.

If this situation is left unaltered, all the processes driven by the data warehouse are in danger of being compromised, and the business exposes itself to very serious risks.

Moving Beyond

In my next post we will look at some ways of moving beyond this impasse that have been made possible by recent technological developments.

Arthur Ritchie
June 12, 2007

Part 2 >>