An early step in the study of animals is their taxonomy. Taxonomy is the arrangement of animals according to their biological groups. The process begins with the all-encompassing grouping of life itself and proceeds through several levels to the most specific definition of species. The modern taxonomy begins with domains and proceeds to kingdoms, phylum, class, order, family, genus, and species. Each member of these levels shares common characteristics. For example, the phylum of Chordata consists of animals with backbones, throats, and a tail.

Another less-technical sounding term that is often used synonymously with taxonomy is classification. The use of classifications is not limited to animals and plants; we use them for many reasons. We classify industries as construction, hospitality, information technology, and so forth. We classify foods like vegetables, fruits, meats, dairy, and grains. We even categorize governments as democracies, monarchies, dictatorships, etc. As a result, we quickly understand something in the context of their classification and can respond to that understanding appropriately.

In the world of data and data management, data classification is the practice of categorizing data according to their attributes, so that it is used efficiently, and securely.

An index of a book illustrates the power of classification when it comes to locating the data a person is seeking quickly. In the index, the categorization of words or subjects is alphabetical. If you are looking for the portion of the book that discusses walruses, you would turn to the page of the index housing “W” and turn to the page specified. Doing so is far more efficient than flipping through the pages of the book, hoping to land on the word “walruses” among the text.

While there is no doubt of the potential power hidden within the digits housed in our databases, not all data is equal in value. Some data is valuable only within the operations of a business; for example, anyone who has shopped at Ikea understands that location codes help you find the table that you just purchased within their warehouse. Other data have a broader application, such as a person’s social security number, which, while its original intent was to be used only by the government, has been used to identify people across many industries. Data classification quickly informs users and administrators of the variety of data elements contained within their data systems.

Some data elements have laws and regulations which dictate how the data is used, shared, secured, and stored. While there may not be any restrictions or requirements associated with the warehouse location codes of the seasonally popular leg lamp, the credit card number for the person who just purchased one will have precise guidelines for its management. The classification of these data elements will aid the data administrator of which security and compliance policies to apply to what data elements.

Classifying data can be a labor-intensive process. While it is easier to perform in the design phase of creating a database, many do not have that luxury. There are several solutions available that deliver an automated scan of the database and leverage an algorithm to determine the most-likely classification of the data elements. Microsoft has included such a solution within the database engine of the latest version of SQL Server. However, regardless of how advanced their solution claims to be, a manual review and adjustment may be necessary to ensure complete accuracy in the classifications.

Despite the potential upfront labor cost of working through the data classification process, it is far less costly than not having a full understanding of your data catalog and misapplying data management policies and practices. Additionally, an unclassified database may contain untapped insights that are eager to boost your business, if only they could differentiate themselves from the sea of data elements within your data ecosystem.