A recent Information Management article raised a number of issues with data warehouses and why they are such time-consuming projects. According to the article, the main reasons are primarily around changing scope, data quality and ETL design. I’ve discussed how to handle the scope and design issues in earlier posts about business modeling. Over the next few posts, I’ll talk about how to deal with data quality in the data warehouse.
By its very nature, a data warehouse is going to include a wide range of types of data that characterize the “facts” or transactions stored in the warehouse – the product or item, customer, supplier, market or geography, time period, employee and so on. This data is considered master data. While we can’t do much about the fact data being wrong – e.g. the wrong price was charged for an item sold as recorded in an operational system – what we can do is try to harmonize all of the other master data that comes from a variety of systems before we deliver it to business users for use in analysis. There are numerous steps in this process, so let’s start with the first one, and that is figuring out the relationships among the data.
Because the warehouse holds a wide range of data, it is important that your master data management activity be able to cover all the relevant master data used in BI. If you are only able to address a single dimension – say, customer data – then you run the risk of still providing erroneous information to your BI users.
Consider a common question in BI:
“How many widgets did the pharmaceutical sales team sell to wholesaler customers in the northeast year-to-date, compared to the same period last year?”
This includes master data about product, sales team, customer, region and time. Your MDM tool needs to be able to handle all these “domains.” If it can only handle customer data you may have a nice, clean list of accurate customers, but your products and your other dimensions may be suffering from poor master data quality. And the result is analyses will suffer, and perceived value of your warehouse and BI tools will as well.
You may have also noticed that some of the master data elements are at different levels of granularity. “Widget” is the lowest level of finished product, but “pharmaceutical sales team” may be a roll-up of several individual sales reps, “northeast” is a collection of states/provinces or postal codes, and “year-to-date” could be a collection of days or weeks or months, which of course will be different whenever this question is asked. Similarly, the dollar or unit amount collected may come in at the weekly level rather than daily. There may also be multiple roll-up paths for a given dimension.
As you can see the relationships here can be pretty complex, and it’s likely that the complex hierarchy example here is not defined in the OLTP systems that capture the sale. These need to be authored and managed somewhere, and a capable MDM tool can provide that service for the warehouse.
If you are going to invest in an MDM technology, you want it to be useful in as many use cases as possible. For the purposes of this discussion I’ve focused primarily on the data warehouse quality use case, but some Kalido customers have used our MDM capability in many other areas.
One example is of a global consumer goods company that started by using Kalido to handle customer and vendor data when they performed a system migration. Next, they tackled data harmonization for operations on employee data, followed by data harmonization for analytics and hierarchy management for BI on customer, product, supplier, geography and employee master data. They later went through an acquisition and used Kalido for operational data governance and system consolidation for a new global SAP instance. Now their MDM repository handles over 20 domains and over 250 entities, and manages all the relationships between them.
So if your business, and your data warehouse, is dependent on good data to be run well and analyzed accurately, an “every domain” MDM solution can provide a place to model all the data and relationships. You may not have to tackle them all at once – like my customer example above who started with one or two and added more as they expanded their implementation. A single flexible technology that can be deployed this way offers an approach that enables you to avoid maintaining multiple domain-specific MDM silos, simplifying the effort and activity around getting good data into the warehouse.