Authored by Nathan Olson
Data governance is as broad a topic as data itself, and as such, there are far too many of these ‘how’s’ to cover in one article. This article is the second part in a series to go over many of the desk level procedures that make up data governance. For more information on this topic, read part one, Properly documenting data assets and part three, Data security, privacy and compliance.
For organizations striving to make data-informed decisions, data quality is key. When the accuracy or completeness of data is poor, leaders quickly lose trust and lack confidence to make informed business decisions. An organization’s data assets can only provide value if they are being utilized properly.
The topic of data governance continues to be inaccessible and difficult to define, despite how much airtime it gets in the industry. Given the wide breadth of domains that data governance covers, breaking down data governance into digestible sections is helpful; this article does that by being part two in a three part series. The first article in the series detailed how documenting data assets can provide concrete benefits for a data program. Part two of this series will continue that trend of outlining the ‘how’ of data governance by focusing on how to implement data quality initiatives in a governance program and explore tangible steps an organization can take to increase accuracy, efficiency, and trust in their data assets.
Data warehouses are systems that combine data together from multiple different sources within an organization. The data from these warehouses are utilized in analytics and reporting purposes for business leaders to make more informed business decisions. Data warehouses can be very fluid, as data is constantly being added, transformed and merged to meet business needs.
Similar to Bob Ross’s painting lessons, “happy little data quality issues” are part of life. There is often a desire to scrap data assets that have data quality issues and start over. If an organization does not have data governance practices in place to address data discrepancies, any new data asset is going to end up with the same issues as the original. To avoid this, set in motion a regular cadence for information management to clean-up their data assets. This “spring cleaning” can include:
Although these activities should be part of normal development, having dedicated time quarterly or annually can make sure issues are addressed.
Another solution to reengineering current data models is to implement code maps. Code maps align industry-wide data elements coming from disparate systems into a format that allows data patterns and relationships to become easily identifiable. For example, if a billing system stores US states according to their two-letter abbreviation, and an enterprise resource planning tool (ERP) uses its own ID numbers for each state, combining these sources can represent a data quality risk. Information management can create a code map in the data platform which takes an industry standardized list (start with the International Organization for Standardization (ISO)) and maps each of the different source systems lists to the standard. This scales well as systems are added and removed to a data platform.
There are varieties of data accuracy tests that can be developed. The reports these tests produce can identify:
All data integrity tests increase transparency into the overall data quality and use automation to surface insights efficiently. Data integrity and business rules are the two primary types of quality tests that can be built. Data integrity tests validate data is complete, up-to-date and consistent with the original data source. For example, there may be a test to verify that a transaction amount in the reporting database is the same value for that transaction in the operational data source. In contrast, business rules verify that data behaves logically relative to the business. Using the same example before, a business rule test may check that no transactions of a certain charge type have negative values.
Among the many different data integrity tests that can be built, the most critical is a data consistency test. In this approach, a data report or analysis would compare two or more data sources that should contain the same data. For example, an organization may generate a data query or extract from its ERP on revenue by department for the past calendar year and generate the same dataset from its reporting database. An assigned team or individual should then compare these reports and follow up on any discrepancies immediately.
Setting up business rules in a data platform will build trust with its users. It also offloads the burden of business users manually cleansing the data when they consume it. The information management team will be responsible for implementing these rules, but defining the rules needs to be done in close collaboration with the business.
Data stewards who are close to the business need to communicate statements about the data that should always be true and never be true. These rules can then be built into data processing jobs to fix or eliminate records that do not match. Organizations must be very thoughtful about any scripts that modify or delete data. In most cases, this data manipulation should be done to data which has already been staged in a data warehouse, so that the raw source-of-truth is available as a reference. Additionally, data stewards should think critically if the data warehouse is the right place to fix this data, or if processes or technology upstream of the warehouse should be adjusted so that inaccurate data is never brought in to begin with.
Business users and information management should understand two fundamental truths about data quality:
Senior leadership and the data governance team should work closely with information management to define metrics to measure the quality of data assets and policies to address how issues are resolved. For example, a metric may be set for a certain data Extract-Transform-Load (ETL) process to succeed without errors 95% of the time; with any processing failures that do occur needing to be resolved by Noon on that day. To support this, information management should build a mechanism to alert stakeholders when failures occur and follow-up on the resolution. These metrics could then be reviewed periodically by the data governance committee and create an open forum for improvements.
Another helpful process is establishing a regular auditing process for an organization’s data assets. This should not be a reactive response to a data quality issue, but instead a pre-established process that occurs quarterly or annually. Data stewards or information management can review reports, data assets and other analyses for inaccuracies or out-of-date data. The findings should be reported to the data governance committee with timelines and plans to resolve any discovered issues.
Data quality is not an all-or-nothing initiative, but an on-going effort through a series of tools and best practices that can improve the efficacy of your data platform. The results of these data governance practices can improve the accuracy and integrity of data assets and generate returns through automation and issue prevention. Even more impactful, these practices can increase the transparency around data quality and increase stakeholders’ trust in the data. As trust in data increases, a more effective and consistent use of data assets and a more data-driven business emerges.
If your data governance initiative does not establish data quality processes to better utilize your data assets, and set up integrity and business rules to create a culture of quality, it is going to fail. Empower your staff to collaborate on the ‘how’ of data governance to make better-informed business decisions. Baker Tilly is here to help you along your data journey, contact us to get going on your 'how.'