We have often discussed the concept of using an iterative methodology when constructing a data warehouse, and have detailed the benefits of such an approach: better alignment with business needs, faster delivery of results, better return on the investment and overall lower development costs, just to name a few. However, it is also important to understand how iterative methodology can support the not only the initial delivery, but also the ongoing extension and enhancement of the warehouse. I like to call this an “Incremental Methodology.”
As anyone that has ever delivered a data warehouse will tell you, the warehouse is never really “done.” This is not surprising. If we accept that the purpose of any data warehouse is to support the decision making process in the business, and since business conditions and needs are constantly changing, we would expect that the warehouse will need to keep changing — if it is to keep pace with the business it is designed to serve. The first delivery of a data warehouse is rarely the final word. New areas of information, new relationships and new data sources will all demand changes to the warehouse.
This is nowhere more evident than in the needs of the data scientist. Data Science, by definition, is an exploratory process. One question leads to another and then another. The data scientist will often only know the first question to be asked, and the various follow-ups that depend upon the answer. Rarely will they be able to predict the third level of questions and will almost never know the fourth or fifth round. This means that new information, organized in different ways, will be required. Since sourcing this information from operational data is full of obstacles and the danger of “bogus” data, the warehouse is often the best source of information. However, many warehouses struggle to keep pace with the basic BI demands of the business, never mind the rapid fire needs of the data scientist.
Incremental development offers many advantages, not only to support the data scientist, but to make the entire data warehouse effort more productive and responsive to the business. One of the factors that traditional data warehouse approaches must contend with is the cost of change. Traditional data warehousing techniques – whether employed with an iterative methodology or not – result in a relatively brittle architecture. This is because so much of the infrastructure is based in discrete fragments of code. Code that extracts source data, code that normalizes data for warehouse storage, code that manages surrogate keys, code that manages slowly changing data, code that enforces referential integrity, code that validates conformance to business rules, code that calculates values, and on and on. Making a change to this complex infrastructure is error prone, costly and time consuming.
If, however, you have a single integrated tool that can accept a logical data design and then automate the vast majority of the required steps – and provide easy configuration of the rest – then you make changes far more easily. A simple change to the logical design can be interpreted by the application which then makes all the required changes to code, to table structures, to output structures, etc. Now, instead of taking weeks to effect simple changes, the warehouse can be updated in hours.
This ability to change and evolve easily, and in an iterative manner, allows a truly incremental approach to be taken. Instead of designing a warehouse to answer ALL possible questions on day one (a daunting task at best) we can instead design it to only answer the most valuable questions. Once we have these answers, we can ask the next round of questions and extend the warehouse as required to answer them. Then the third, fourth and fifth rounds can each be addressed in turn. Sound familiar? This is exactly the sort of incremental approach the data scientist uses as they analyze information. Now, not only can your warehouse be more productive and responsive, it can supply the raw information needed by a data scientist.
Certainly, all these changes mean that the warehouse needs to be expanded and extended. But a curious thing happens: each extension is smaller than the one before. Because of the way data and metadata are managed within Kalido, as each extension is added, it is able to leverage the data already in place without requiring extensive reconstruction. The engine automation manages the changes for you. We have seen this over and over…a warehouse starts as a way to answer a few key business questions and over several iterations, each adding new increments of data, it evolves into an Enterprise Data Warehouse.
This incremental growth approach can be implemented in almost any organization. But is does require the right tools. You need a modeling tool that can be used by the business to verbalize their needs, and by IT to design an infrastructure to meet those needs. This combination of business understanding and technical robustness is amply met by the Kalido Business Information Modeler (BIM). You also need an automated application that will turn the model created in BIM into the required tables, load routines, validation rules, and outputs. This is what Kalido Information Engine brings to the table. Finally, you need a way to transfer the resulting semantic information into the BI and analytical tools of choice. Kalido Universal Information Director does exactly that, producing the frameworks, universes and cubes that make the warehouse information instantly accessible to the business.
Incremental development can have a significant impact on the cost of building, maintaining and evolving your data warehouse. Many corporations are doing it now.
See firsthand how Kalido can provide the foundation for incremental growth in your organization.