In my last blog, I compared the digital world of data to the physical environment we live in. I made a case for thinking about data governance in terms of data policies aimed at keeping bad data, analogous to trash in the physical environment, out of the data environment in the first place.
As with other forms of governance, data governance does its job through data policies, which are the instruments of governance. Like social policies, data policies must confer concrete benefits that outweigh the cost. Recall our example of a bad billing address that prevents finance from invoicing efficiently. Let’s take a shot at framing a data policy to address this problem:
Finance-Invoicing in North America requires that the customer on an order have a mailable and accurate billing address.
The policy contains the following discrete components:
Beneficiary Business Process. The beneficiary of the policy is “Finance-Invoicing”, and it’s clearly spelled out. Finance-Invoicing is the customer of data. There are clear business benefits for the policy. Violation to this policy has material business impact: delay in payment and added cost in the invoicing business process.
Also, the policy puts forth a requirement for all business processes upstream that supply the data without specifying which one. Those upstream business processes could include order entry, customer service or third-party sales.
Organizational Scope. Organizations are typically structured in a matrix of geographical entities and lines of business. The scope of this policy is for North America, where invoices are sent physically by mail to a street address. In Europe, invoices may be issued electronically, in which case a billing address isn’t needed. Data policy requires an organizational scope.
Data Element. The policy references a set of data: customer. It specifically references a set of attributes of customer: the billing address fields. These fields are referenced in a logical way, without specifying a physical database, table or columns where the data resides.
Rules. Mailable and accurate. A policy, by definition, is a grouping of rules created and enforced as a unit. Mailable means the address must be reachable by mail. Many IT tools can validate addresses to ensure its validity based on US Postal Service data. Accurate means that the address is indeed where the customer wishes to receive the bill. This is harder to evaluate ahead of time, but we would know it if it is inaccurate: the invoice is returned. We can put a mechanism in place to capture these events. Both rules are measurable: they’re either true or false. Both have to be true to comply with this policy.
Taken as a whole, we have the definition of a data policy. It’s a set of measurable rules for a set of data elements, in the context of an organizational scope, for the benefit of a business process, irrespective of where the data is stored and the party that provides the data. It is structured, referrencing data elements, business process and organization, but the rules are expressed in natural language such that it can be created and understood by the business. The example is a data quality policy. Other types of policies include data security, data lifecycle management, data definitions and reference data that acts as policy.
Data policies often appear like business requirements when embarking on a one-time data-related project. The difference is that rather than written in a requirements document for the purpose of building a system, they’re living and breathing, actively managed by the business, and implemented and enforced across the enterprise organization.
In my next blog, I’ll talk about the processes around data policies.
This blog is part 4 of a multi-part series of blogs on the topic of Enterprise Data Governance. To read other posts from this series, please see below.