Metadata on computer
Article

Metadata and tagging are key to discovering the right data in a data lake

Metadata is what we talk about when capturing data about data, such as the source the data originated from, how data is structured and formatted and when it was captured.

Metadata capture and tagging are closely related and support data exploration and governance within the business.

Tagging drives data discovery in a business. A data lake could potentially have petabytes of data, much of which might not be funneling to production systems yet, because its value has not been identified. Tagging is both an automatic and manual business process and is unique to a business.

Tagging is used to understand the inventory of a data lake through the notion of tags, similar to those attached to social media and blog posts. Consider a data lake in a healthcare business housing data from multiple electronic medical record systems. You might have a tagging strategy of EMR > System Name > Patient Encounters (the dataset). This allows individuals throughout the business to search for EMR system data by tag.

When devising a tagging strategy, the following should be considered:

  • When does the data get tagged?
  • Who will own and be responsible for tagging the data sets?
  • What will the naming convention be so that data sets can be easily discoverable by other business units?

Meta data maturity

Traditional metadata: This includes information about flat files, tables, connection strings, file store location, file name, data types and length, etc. You can leverage this metadata in patterned ETL development and management.

Modern metadata: Utilizing file formats that allow metadata to be stored within a data file. Storing metadata within the file itself assists in solving problems such as evolving schema changes.

Advanced metadata: Automated processing and pre-defined aggregations upon loading into the data lake. If you have worked with Microsoft’s Power BI application, an example would be the Quick Insights feature that shows simple relationships in the data and quick aggregations of the data.

How we can help

Once a business constructs their data lake, it’s imperative that they are able to easily search for and access the organized data inside. Metadata capture and tagging support data exploration and governance which propels your business forward through actionable insights gained from data. At Baker Tilly Digital, we help our clients structure their data so it’s understandable, accurate and constructive to the increasing growth of their business. We work with you to propel your business forward through well-structured modern cloud platforms that leverage incoming data efficiently so you can derive new business value from your data. To learn more about organizing and accessing the data in your data lake, contact one of our professionals.

software connection map
Next up

Signs you have outgrown your ERP