Lights speeding by with city buildings in the distance

Cybersecurity management: data classification demystified

Previously, we presented five key components to an effective cybersecurity management program: 1) data classification, 2) security control implementation, 3) regular verification of control performance, 4) breach preparedness and planning, and 5) risk acceptance and risk transfer. This article takes a deeper dive into the data classification process and provides an approach to complete your own data classification.

What is data classification?

Data classification is the process an organization follows to develop an understanding of its information assets, assign a value to those assets, and determine the effort and cost required to properly secure the most critical of those information assets. Data classification is an important first step in establishing a cybersecurity management program, as it allows an organization to make managerial decisions about resource allocation to secure data from unauthorized access. For purposes of this article, we’ll assume the organization has already completed a risk assessment and understands its regulatory and contractual privacy and confidentiality requirements.

Data’s value

Today’s business systems and the nature of connected enterprises generate a tremendous amount of data. While this data has organizational value, it’s important to recognize that not all data has equal value. Accordingly, all data doesn’t need to be protected the same way. Conducting a data classification allows you to determine the different types of data resident within your organization and will provide insight into the protection requirements for each type of data.

Four essential activities

There are four essential activities a successful data classification effort will include:

  1. Identify: Identify the data
  2. Locate: Identify where the data resides
  3. Classify: Categorize and determine which data needs to be protected
  4. Value: Assign a value to the data


It sounds straightforward: identify the data. However, systematically identifying the data within your organization can be quite challenging. This isn’t an activity you can delegate to your database administrator. It needs to be cross-functional, organized around business processes, and driven by process owners. We typically see these completed as walkthroughs of each business process – tracing the data flows with process owners to identify the data. You will need to ask several key questions to identify the types of data maintained by your organization:

  • What data does your organization collect from or about customers, suppliers, trade partners?
  • What data do you generate about these stakeholders through the course of business?
  • What proprietary data do you create in the course of product development, manufacturing, marketing, and sales?
  • What transactional data do you obtain or generate in the course of business?
  • Of all this collected and generated data, what is private and what is confidential?

Identifying and cataloging this data is the first step in the classification process. It will serve as a basis for all other classification activities.


Moving beyond the initial identification, it is important to next identify all the places where this data is stored electronically. You will want your database administrator or enterprise architect to support and help complete the location identification process. Data loss prevention tools can even be used as part of this step to help scan networks looking for certain types of data.

Today’s business systems are so tightly integrated that data is often interfaced or shared across systems. This means data might reside in multiple systems, not just the system where it is originally created or entered. Be sure to think about reporting systems and data warehouses too. Many times transactional data might be archived in these systems. It’s also important to consider systems like document imaging and photocopiers where electronic copies of once physical data now reside.

One last system to think about when identifying where data is stored is your actual backup system. Nearly all organizations retain backup copies of data. It’s not uncommon for organizations to retain multiple copies of system data to support record retention requirements. However, it’s nearly as common for organizations to retain these records in excess of the business requirement. Identifying where data is retained excessively so that it can be properly purged is an important outcome of a data classification process.


By now, you’ll have a solid understanding of the type of data your organization maintains and where it is stored. The types of data should be readily apparent now and will probably fall into major categories including:

  • Customer data
  • Personal data
  • Protected health information
  • Credit card data
  • Competitive data/trade secrets
  • Publicly available data

At this point, it’s important for management to determine the classifications that are relevant to the organization and what to protect. The list above is a good start, but there might be more for your organization. The main objective at this point is to be able to have an index to the different types of data within your organization. Resist the urge to get too granular. More than ten types of business data in your classification? It’s a good sign that you’ve gone too granular. Too granular tends to be less manageable over time. Five to eight classification categories is reasonable.


Assigning a value to the data you’ve classified so far is a critical step as it will allow you to make informed decisions about how much you spend to protect that data. Multiple factors contribute to the overall value of a data set. Organizations need to consider the penalties associated with a loss or breach of data. By understanding the potential hard and soft costs associated with a breach of a given data set, an organization can set realistic expectations for the cost to protect the data set. Consider these examples:

  • HIPAA data: What fines can be levied per record disclosed?
  • PCI data: How would an increase in credit card transaction costs affect net revenue? Or worse, losing the ability to accept credit cards?
  • Personal data: What civil penalties might your organization be subject to for inadvertently disclosing personal information? How much would you need to pay to provide credit monitoring services to your customers?
  • Trade secrets: How would a loss of competitive advantage affect revenue?

Next steps

Having completed these steps to classify data, an organization can make decisions about how to manage and secure its data. The data classification results should be used in the overall cybersecurity management program. By understanding where the data resides and the organizational value of that data, an organization can implement cost effective and efficient cybersecurity controls based on the business risk associated with each type of data maintained by the organization.

For more information on this topic, or to learn how Baker Tilly specialists can help, contact our team.

Next up

Five best practices to manage hedge fund cybersecurity risks