Finger pointing on iPad data analytics
Article

The 'how' of data governance: properly documenting data assets

Data governance is as broad a topic as data itself, and as such, there are far too many of these ‘how’s’ to cover in one article. This article is the first part in a series to go over many of the desk level procedures that make up data governance. For more information on this topic, read part two, Implementing data quality initiatives and part three, Data security, privacy and compliance.

Having formalized controls around information management will improve data quality, mitigate security risks, better align business and IT, and enable employees to get more out of data analytics. A tremendous amount of effort has been spent justifying the “why” of data governance. As with any large change effort, explaining the “why” is critical for getting buy-in from stakeholders, as well as from the organization at large. The problem is that most books and presentations on the subject stop there. Some resources walk through the process of kicking off a data governance program, including creating a charter, establishing a data governance committee, and appointing data owners and stewards. A full declaration of the data governance mission and vision is only effective if employees know the “how” of data governance.

Don’t stop at the 'why'

The International Data Corporation has forecasted that “the Global Datasphere will grow from 33 zettabytes (ZB) in 2018 to 175 ZB by 2025” [1].  That is a massive increase. To compete in an environment with this amount of data growth, organizations need to ensure they are effectively tracking where their data assets live and how they relate to each other. Desk level procedures provide a clear understanding of data governance in action on a day-to-day basis. These are tools an organization can choose to implement as part of their data governance program, depending on what it is they’re trying to accomplish.

This article will dive into data asset documentation and the practical artifacts to employ to properly manage the ever-expanding data assets while highlighting some of the most impactful tools to implement at your organization: data dictionaries, business glossaries, source to target maps, and data catalogs. Although there is a lot of value to be gained by comprehensive and robust forms of these tools, even basic versions can provide some quick wins on your data governance journey.

Go meta

It’s possible to generate returns in a relatively short amount of time by creating a data dictionary. A data dictionary is a central metadata (data about data) repository, containing definitions, relationships to other data, data source information, data formats, and recommended use cases. A technical resource like this can be utilized by anyone interacting directly with a database, data lake, or primary data source. Research has shown that “80% of a data scientist’s valuable time is spent simply finding, cleaning, and organizing data, leaving only 20% to actually perform analysis” [2]. Having a metadata repository available can reduce an analyst’s time collecting data and allow more time for analysis. By standardizing data assets in one place, it becomes easily searchable, and provides necessary context and meaning. Most SQL Server platforms have native data dictionary functions built in, which are accessible at no cost to anyone with SQL knowledge. If there’s a relatively small amount of data and an organization is looking to build a dictionary with user-generated fields, such as definitions and use cases, this can be done easily in Excel. If your organization is looking for a more robust solution, which requires less manual maintenance, it is recommended to use a third-party vendor tool, which can interface directly with data assets.

Getting down to business

For non-technical users, creating a business glossary is extraordinarily helpful. A business glossary is a central reference of business terms and definitions, which is valuable to an organization because it empowers business users to have an active role in data management as the magnitude and complexity of data platforms increase. By enlisting the help of the business in creating a glossary, long-term ownership is established as data stewards are formally recognized.

While there are third-party vendors that offer business glossaries, it is recommend to start with a  simple spreadsheet that tracks the definitions, data domains and owners for each term. The option to transition to a more advanced solution when running into maintenance and scalability issues is always possible. Listed below are tips for getting started on the Business Glossary:

  • Start with a small set of terms (15-20) from a single data domain area, then grow the reference one department at a time
  • Take industry standardized definitions as a starting point, then adjust them to the organizations’ specifications
  • Include common acronyms in the glossary.
    -This gives new employees a reference and helps distribute the glossary as a valuable resource
  • Make this accessible to everyone at the organization

X marks the spot

The single most valuable piece of documentation to empower the information management team is a Source to Target Map (S2T). The S2T is a blueprint for a data platform which stores all of the information and describes how data gets from the source systems into the end-point data solutions. This document includes data lineage, how to join data sources, conversion and business rules, ETL frequency, and important metadata to the transformation. Having a Source to Target Map will help technical resources get the information they need faster and will free up time for the individuals who have been storing and relaying the information to everyone who calls or emails them.

A Source to Target Map can take many different forms and will vary based on the data platform. It is recommended to have two levels of documentation:

  1. A diagram, which could be created in a graphics application that shows a high-level view of how the different data sources, ETL tools, and reporting solutions feed into each other. This is helpful to anyone who wants to understand how the data platform works and it will quickly prove its worth.
  2. An excel document, which is a more detailed resource that outlines the transformation path of individual fields and their metadata, business rules, and data types. For any data engineer working on the ETL platform or business analyst looking to improve their understanding of the data, this will be a go-to treasure trove of information.

One to rule them all

The three tools covered so far are straight-forward, focused purposes that can deliver specific value. A data catalog is a central directory, which can help any business user find data assets and understand how they should be used. Catalogs can offer an impressive amount of functionality to standardize, communicate, and maintain data assets. Many Data Catalog applications allow for advanced searching, metadata tagging, in-app communication and collaboration with other staff, in-app querying and reporting, direct integration with data sources and visualization software, and data profiling. They can also be leveraged to manage the other tools talked about – data dictionaries, business glossaries, and Stage to Target Maps. A catalog is expansive in what it can offer and how it can be implemented and should be the flagship data asset documentation tool for any advanced data governance program.           

There are a multitude of vendors which offer Data Catalog tools and software. A recommended approach is to start small and build out as a foundational Data Catalog. Even something as simple as a spreadsheet, which lists various business-facing data sources (reports, data marts, files) with descriptions of how to access and use them can provide a lot of value to business users.

Get going on the 'how'

If your data governance initiative does not have executive support, a clear framework of roles and a guiding vision, it is likely going to fail. Empowering analysts to leverage the ‘how’ of data governance more efficiently will help business users make better decisions. These will empower your analysts to leverage the data platform more efficiently and help business users make better decisions. Baker Tilly is here to help you along your data journey, contact us to get going on your 'how.'   

References

  1. IDC White Paper – The Digitization of the World from Edge to Core
  2. What Data Scientists Really Do, According to 35 Data Scientists by Hugo Bowne-Anderson
Stairs and columns outside government building
Next up

New PPP guidance addresses owner-employee compensation, nonpayroll costs