In my new role as a manager of data governance for a non-profit company, I am working towards making our nascent data governance framework operational. This requires a tool that would automate a number of tasks and provide visibility into the progress we make as we work towards achieving our metrics for success.
A helpful presentation at the MDM and Master Data Governance Summit in New York, which I recently attended, gave a good initial list of criteria, as shared earlier by Information Management. I adapted the list slightly to our needs and decided to share my version with whoever else might be facing a similar task:
Evaluation criteria for choosing the right data governance tool
My initial work focused on launching a data stewardship council. This circle of program managers intimately familiar with the business value of data worked on drafting a set of data governance policies which need to be approved by the data governance board. Thus one of the criteria for selecting a data governance software tool focuses on:
Data Policies, Standards and Processes
The software should provide the ability to determine the data ownership within data policies, the data roles within standards and the data processes including data stewardship meetings. Ultimately, the tool should empower our data stewards.
Using the software, our data stewards should to be able to manage artifacts such as business terms, data policies, data standards, data quality rules, data quality metrics, master data rules, master data tasks (e.g., duplicates) and any other artifacts that are fully configurable (e.g., regulation)
In this early stage of our data governance evolution, we focus on data governors (basically, the executive team of CXO and VPs) and data stewards but we are aware that we need to pair those business roles with more technical counterparts. Data custodians would play crucial role in implementing the policies drafted by data stewards and approved by data governors. In the future, more nuanced roles might need to be defined.
The software should enable the creation of custom roles including data steward, data custodian, data owner, data executive, data sponsor, stakeholder, subject matter expert, and those who are responsible, accountable, consulted and/or informed.
The software should define the approval workflows for the different roles.
Data Governance Metrics
Metrics for success should never be an afterthought. The software should provide the ability to track specific data governance metrics:
- Reference Data – Number of candidate code values, number pending approval, number approved
- Data Issues – Number of outstanding data issues, number resolved in the last period
- Data Quality Scorecard – Data Quality Index by application, by critical data element
- Reporting Vectors – By Data Steward, Data Owner, Data Repository, Application, Data Domain
One of the earliest challenges we face is the creation of an enterprise business glossary to be paired with the more technical data dictionaries.
Usability of Business Glossary
The data governance software should allow us to create taxonomies, manage business terms, import business terms in bulk and hotlink business terms within business terms.
The software should to provide ability to name and describe custom attributes? Beyond naming the customizable attribute, it is important to provide a definition, short description (with a little background), a long description (a few paragraphs of more depth), an example and data security labeling (indicating the level of security, e.g., public, internal or confidential).
The software needs to provide ability for describing customizable relationships, the acronym or abbreviation, synonyms, replaces/replaced by (which points to deprecated terms), assigned assets, allowable values (links business term to associated reference data) and what policies and data rules govern the business term.
Master Data Rules
The software should allow the creation of data enrichment rules, create data validation rules, create entity relationships, create record matching rules, establish confidence thresholds and create record consolidation rules.
Allowable Values for Reference Data Business Terms
The software should allow specific values for business terms, e.g. the common abbreviations for U.S. states as an example of acceptable reference data.
The software should allow the documentation of the data lineage, including jobs running in parallel.
The software should provide the ability to create an impact analysis, specifically for assets identified in the data lineage.
Hierarchy of Data Artifacts
The software should allow the linking of policies, rules, terms and reference data.
Profiling of Diverse Data Sources
The software should allow the profiling of diverse data sources, including manual (SQL scripts), automated (vendor tools) and diverse data sources including NoSQL and Hadoop. Currently, we have a mix of Oracle, SQL Server, Pentaho, CouchBase, MongoDB, MS Access, Excel… Who knows what else the future will bring?
Data Quality Scorecard
The software should provide the ability to create a scorecard, listing our company’s data governance metrics, goal, periodic status updates and baseline.
Data Issues Log
The software should provide a data issues log to track issues, the steward assigned, data assigned, date resolved and the current status.
Data Issue Resolution Process
The software should ensure the issue management and resolution process is fully documented.
Support for Internal Audit
As an organization entrusted with managing data that is ultimately owned by the public, we are subject to frequent audits. The software should provide data repositories that are subject to internal audit on a periodic basis. Each repository should have a data owner and should be audited for compliance to specific data governance policies, such as 1) presence of a data dictionary 2) whether the rules have been documented and 3) who determines access controls.
To summarize, here is an informative video on selecting data governance tools presented by DataVersity:
What else would you add to this list? What has your experience been in selecting a data governance tool?