Data warehouses are hard to build. The goal, of course, of any data warehouse is to house all the data required to provide the business with the answers needed to run the business effectively. The problem is that the questions keep changing and evolving. This is compounded by the fact that building an enterprise level data warehouse (EDW) takes a long time. An EDW for even a medium size firm can take months and for a large corporation it could easily take years. This is, not surprisingly, a factor in the failure of many EDW projects – by the time they deliver information to the business, the information provided is inadequate or irrelevant (and sometimes both).
Much of the problem centers around the need to determine exactly what information will be required. Most EDW projects begin with a long requirements gathering phase that tries to anticipate every bit of information that might be needed to answer the key business questions. Since rebuilding a data warehouse is expensive, a lot of effort is spent trying to get it right the first time, but that is far easier said than done. Too often key pieces of data are omitted, requiring extensive rework before an answer can be delivered.
To help organizations jump start this long requirements phase (and the data modeling that comes out of it) a number of vendors offer pre-built “best of breed” models for various industries. Allegedly “complete” models for industries as varied as banking, pharmaceuticals, telecomm and insurance can be purchased (usually for very sizable price tags) that promise to be complete, proven, and well architected solutions. At first blush, these models are very attractive since they can save months of analysis. By deploying the “standard” model, an organization can immediately start populating it with their own data and be able to deliver meaningful information months sooner than if they built such a model from scratch. At least, that is the plan.
Unfortunately, in project after project, I have seen these models fail to live up to their promise. Despite adoption of an industry specific model many EDW projects still spend many months or years (not to mention millions of dollars) and are still unable to answer the most basic business questions. The problem seems to come down to three effects that occur when a “pre-built” model is employed.
The “boil the ocean” effect
Many of these models are massively complex, with hundreds of tables and relationships. The effect that this has is that IT organizations can not readily implement the models in a phased approach. Simply determining which tables must be populated to answer a few key questions can take as long as building a model from scratch. As an example, I’ve seen organizations start with a list of 30 or so “critical questions” that are to be addressed by the EDW project. In at least one case, after 3 years and many millions of dollars, the IT organization has been forced to reduce the scope to about 10 questions…and they require additional funding and 6 more months to deliver even that. In too many cases the effort required to populate these massive models simply does not deliver and adequate ROI.
The “data landfill” effect
Even in projects where the pre-built model is fully populated, the promised insights are often still found to be elusive. Most of these models that I have seen are focused on collecting and storing data, not delivering answers. After many months or years of assembling the data required the organization is confronted with a new multi-month, multi-million dollar project to recast the assembled data into a form that be used by the business. Whether this involves the creation of data marts or BI-tool specific semantic layers, much additional effort is required. The basic problem is that these pre-built models can tell you what data is required, but they say nothing about how to use it.
The “concrete and rebar” effect
Even if you get the model deployed, get all the data populated into it, and build out a BI layer for analytical purposes, you still have to deal with change. Change is inevitable in business. Whether it is a new acquisition requiring the integration of a dozen new data sources or regulatory changes that mandate a whole new set of metrics or simply changes in the market driving whole new analytical needs, sooner or later the warehouse will need to be updated. But because of the complexity of the model, and the corresponding complexity of the ETL that both populates it and recasts it for BI use, change is hard. Seemingly minor changes can have unforeseen ripple effects. Simply determining what needs to be changed to fulfill a business request can take weeks. Implementing and testing the changes can take months. The resulting lack of agility can be deadly to the success of a data warehouse.
There is a better way. There are products and techniques that can allow an organization to build an agile data warehouse and BI foundation. Agility depends upon a few key factors like maintaining a business focus, employing techniques that allow you start small and grow, and building in iterations that allow frequent “course corrections” to keep what is delivered in sync with the business need. We’ll take a look at how you can build an agile data warehouse in part two.