Last week, a data architect from a large pharma asked me: “Should we build an enterprise data warehouse given that we want to harmonize business processes globally?” My first reaction was: it’s been a while since I heard anyone wanting to build a true EDW.
Why? Is it because of the Great Recession during which companies avoided big and risky projects? Or, is there something else going on?
Let’s look at why data warehouses exist in the first place. We build data warehouses for 3 reasons:
1). We want to hoard data. Transactional systems purge older records. Master data records are usually updated in place, so the previous versions are gone forever. So, we build data warehouses to store historical data in case we need to analyze it later.
2). We want good performance for data analysis. Transactional systems are not optimized for analytical queries. Plus, analytical queries can hog a database and impact operations. So we build data warehouses both to take the load away from transactional systems and to design and tune the database purposefully for handling analytical queries.
3). We want good data. Data from many source systems need to be integrated and cleansed to meet the demands of analysis. This is the “one version of the truth” rationale that many of us have a love-hate relationship with. At the end of the day, this is data management, which aims to achieve both “one version” (consistency) and “the truth” (accuracy), plus “completeness”.
It is third goal, data management, that puts the “E” in EDW, making it different from a data mart or just a plain data warehouse.
Since the data warehousing wave started, dedicated analytical databases have been able to dramatically improve our ability to hoard data (first goal) through high storage density, and to run analytical queries fast (second goal). Progress in database technology will continue to be rapid because competition is fierce.
Meanwhile, MDM and data governance are helping us meet the third goal of data management. Multi-domain MDM promises realize the fabled “conformed dimension”. And data governance, which can define and enforce data policies for quality across the data landscape, promises to affect “the truth everywhere”. In this context, enterprise architects are surely rethinking the old data mart versus data warehouse and EDW debate. If source data — aided by MDM and data governance — are consistent and accurate, then data marts forged out of inexpensive database appliances wouldn’t be bad, would they? And, data marts have a reputation of being agile and more responsive to business needs. In other words, will MDM and data governance make EDWs irrelevant?
Theoretically, yes. But EDWs are not going away anytime soon. Like mainframes, EDWs are deeply entrenched in the IT infrastructure and nearly impossible to sunset. In addition, due to their breadth and versatility, EDWs are still the only viable solutions to many hard problems. New EDWs will continue to get built, albeit at a slower pace.
There is another way for the EDW to get more wind behind its sails: fully and completely embrace data management and data governance. This how an EDW differentiates itself from a data mart-appliance with a narrower scope. And, the EDW is in a great position to be the poster child for data governance, and in doing so, it can keep its status as the center of gravity for all things data in an enterprise.
What do you think? I welcome your comments.