Metadata and machine learning in data governance

Developments to watch: Metadata and machine learning in data governance

A theme that emerged from this year’s Gartner Data & Analytics Summit is that the future of data governance is evolving from a centralized compliance-focused strategy to one where decision-making is being democratized across the organization. According to Gartner, by 2020, 50 percent of information governance initiatives will be enacted with policies based on metadata alone.

So how does metadata apply to the democratization of decision-making? Consider the potential of machine learning for improving data governance processes.

Let’s first define what metadata is. Gartner defines metadata as various facets of an information asset in order to improve its usability throughout its life cycle. In simpler terms, metadata is structured information that describes and explains data in order to make it easier to locate, retrieve, use and manage an information resource. It gives data its context and meaning that’s needed to derive insights.

Businesses are paying more attention to metadata triggered in part by governance and the need to drive smarter insights from the vast quantities of data being generated from an ever-expanding array of sources. This plays a central role in holistic data governance. Metadata that’s governed effectively provides a view into the flow of data and the ability to perform business analysis. It provides a common business vocabulary, accountability for its terms and definitions, and an audit trail for compliance.

There are tremendous opportunities with machine learning and the ability for using algorithms to capture decision-making rules so that it can be applied to a larger dataset.

For example, with data matching, the expert decision-maker is presented with possible matches, and the user can then accept or reject the data match. As the algorithm presents more matches to the expert user, their responses ultimately create the metadata of the decision-making process so that after an initial set of training questions, the algorithm can perform the task un-aided. And if records fall outside the norms, potentially less skilled workers can make the determination of the match. Similar to Pandora, after a few thumb ups and thumbs down, the playlist you’re presented with gets better.

In this future governance model, businesses will get smarter and can speed decision-making across teams by applying machine learning to digitize this process.

This is an area that we are exploring at Magnitude. Watch for new developments with machine learning-based data matching tools that organizations can use to derive smart actionable insights from trusted data more quickly.