Connecting, Collecting and Understanding Data

I read with interest a Gartner report, Modern Data Management Requires a Balance Between Collecting Data and Connecting to Data, that made the case for a bi-modal approach to connecting and collecting data. The case being made is that being able to react “at the edges” of broader data infrastructure (making decisions based on real time data displayed on a tablet, for example) requires direct connection of processes and devices, while collection of data for operations and management insight requires a central collection point and rigorous validation of data accuracy and quality. However, ALL data requires a series of integration processes that describe, organize and integrate the information. That first step, the description, includes the location, trustworthiness and meaning of the data in question.

As anyone that has worked on an information integration project will tell you, the most difficult part of data management is identifying data that means the same thing or describes the same thing. Ask two people in an organization for a list of the key attributes of a product and the answers will be different, reflecting their view of the product in their job. For example, a shipping clerk is most interested in the size and weight, where it is warehoused, and how many we have on hand. A product manager will be more interested in how the product compares to the competitor’s, what markets it sells in and whether customer feedback is positive or negative. These are just examples of quantitative disconnects – real semantic disconnects like definitions of profit margin, “major” customer, “top” vendor, etc. can be even more challenging.

This is the reason we have always promoted the idea of “Semantic Integration” through our Business Information Model (BIM) approach. Simply put, the BIM approach models the way business is consumed by the business, not the way it is stored. It combines many of the aspects of conceptual and logical models with some aspects of physical models, although table design and most other aspects of the physical model are handled automatically by the Magnitude family of master data management and data warehouse automation products. By focusing on the business view of data we flush out and resolve semantic disconnects early in the process, rather than at the end when someone challenges accuracy of a report or allocation of resources. The combination of semantic modelling and master data management provides the key functionality to describe, organize, integrate, share, govern and implement a distributed data management infrastructure.

Approaching data modeling from a semantic rather than technical viewpoint may seem like a trivial point. Some may say that if we simply ask the right questions such issues can be cleared up during requirements gathering. However, experience shows that unless you poll every stakeholder with the same questions and then compare answers you will not reliably uncover these variations in meaning. Indeed, even if you polled every member of an organization you will only discover the disconnects that are well known, not the ones that will derail progress later. In fact, there are a great many standard business concepts that can be very difficult to represent in a traditional data-centric approach.

For example, consider the following business scenario:

Imagine a business that employs distributors to deliver product to customers in many markets worldwide. In some cases, a distributor’s territory may be a large city, in others it may a country, or even a region, depending on market and distributor size. How do we represent that diagrammatically to stakeholders? While there are a couple of different ways to model this physically, none of them will be easy for a business user to understand. By contrast, approaching this semantically is much easier.

With only a brief explanation of what the constructs mean, almost anyone can understand a Business Information Model. In the example shown, we have preserved the hierarchical relationship of the geographic categories – cities roll up to countries which roll up to regions. We also show that distributors are assigned to “Geo Coverages” not to any of the specific geographic categories. This means that a distributor may be assigned to a city or a country, but when reporting at the country level we will still aggregate that city correctly.

The same applies for more obvious sources of semantic disagreement. For example, when defining concepts like “Gross Profit” we visually link the input concepts and KPIs and define the formula for calculation. While the broad concepts are rarely in dispute, defining these crucial metrics so specifically will highlight where disagreements exist. Business information models are best designed along with business stakeholders and I have seen groups that have had different views of many business concepts come to agreement when all the details are illuminated.

In today’s distributed environment, with applications on-premise and in the cloud, often without direct access to the underlying data, semantic integration is critical. Users may think they are looking at the same thing across in-house CRM apps, Salesforce and other applications, but they are often mistaken. The ability to flexibly manage data wherever it lives, and ensure that it is used correctly, is one of the thorniest problems organizations face. Focusing on the meaning of data, along with its location, reliability and usability, are critical to reap the true value from cloud applications and its inherently distributed nature.

For the full Gartner report on balancing the collection and connection of data for greater flexibility and agility in data management, visit Gartner’s website. Login is required.

Modern Data Management Requires a Balance Between Collecting Data and Connecting to Data, Roxane Edjlali, Ted Friedman, 23 October 2017