All Posts

Why Taxonomy is Critical to Master Data Management (MDM)

Organizations are paying more and more attention to Master Data Management (MDM). MDM comprises a set of processes and tools that consistently defines and manages the both the transactional and non-transactional data entities of an organization related to a particular domain, such as purchases or product information and customer data. It connects all an organization’s critical data, which is generally scattered across numerous systems throughout an enterprise, and reconciles inconsistencies, duplication and missing data. MDM initiatives are a critical part of data transformations.

A master data taxonomy is the structure that feeds and organizes this master data. For example, a product hierarchy and product categories might be defined in an Enterprise Resource Planning (ERP) system, which then feeds other downstream applications such as ecommerce sites that need product data. Customer data may be scattered across many different systems that do not all use the same terminology or metadata. MDM integrates and normalizes this data.

New call-to-action

According to a study by Aberdeen, companies using MDM are more than twice as likely to be satisfied with data quality and speed of delivery, compared to those not using master data management. The report notes that with well organized data, users can find the information they need more quickly, which leads in turn to faster decision making and operational efficiency.

Master data depends on reference data from a taxonomy. For example, the master data for a customer record is stored in a “golden record” which could be in a customer relationship management system (CRM) or an ERP. An enterprise taxonomy would provide the attribute values such as industry, customer type, role, interests and other descriptors that would enable classification of customers so that the correct content can be served up as they use a web site.

Content classification and customer classification are important components to personalization initiatives, and they depend on a consistent taxonomy across all customer touchpoints and the multiple systems that comprise the customer experience. Content management systems require the same taxonomies and master data that a CRM requires in order to surface content in context for the user. MDM is the method for providing this over-arching structure that supports multiple systems.

Content, customer, and product classification are especially important for the digital transformations that every organization is now planning or executing. The increasing use of artificial intelligence (AI) also means that a source of truth as reference and master data driven by taxonomy is even more important. AI applications do not know what is important to the organization. Taxonomy and master data tell the AI the names of products, services, customer types, content, knowledge and more.

MDM promises not just greater control over consistent reference data, but an ability to manage the relations between data entities in order to generate more effective business knowledge. From this perspective, MDM requires an understanding and agreement about the meaning of terminology. Hence, the natural role of taxonomy. Taxonomy is about "semantic architecture." It is about naming things and making decisions about how to map different concepts and terms to a consistent structure. Data governance is required to support these decisions and to maintain an enterprise taxonomy with consistent data standards.

MDM challenges and the argument for data taxonomy

Ambiguity . The same term can have different meanings. Taxonomy provides a hierarchy that helps remove ambiguity. It includes mechanisms for understanding context and making meaning precise.

Consistency . Obtaining complete agreement on what terms to use can be difficult. Also, people often use terms inconsistently. Sometimes the terms used in legacy applications differ from those used in newer systems. For various reasons the data sometimes can't be re-tagged to provide consistent metadata. A thesaurus can map terms together to account for these inconsistencies but the mapping needs a set of structures to create these relationships and support data harmony.

Connections . Taxonomies can also represent related concepts (technically also part of a thesaurus) that can be used to connect processes, business logic, or dynamic/related content to support specific tasks.

An MDM strategy defines the process for data cleansing, harmonizing the attributes, and ensuring that all required information is present.

However, MDM programs also need to leverage taxonomy, and taxonomy should make use of MDM initiatives. The two methodologies are symbiotic.


  • Although taxonomy is typically applied to unstructured content, it is increasingly supporting both structured and transactional content - a data taxonomy.
  • Similarly, master data plays an essential role in making unstructured information consistent, findable, and valuable.


The following provides a brief example of key concepts and the role of taxonomy. Note that the transactional data is on the left, the non-transactional persistent reference data on the right.

transactional-vs-master-data Let's look at the product master.



Consider two different manufacturers that both offer mechanical pencils. In our product master, they are called the same thing. However the original product manufacturers do not necessarily use the same terms to describe their products. The original bills of lading might have used abbreviations that are not easily understood, for example. Or the attributes may not be consistently described” or reflected in the metadata. In one case the metadata label may be “customer” and in another, it may be “client.

One manufacturer classifies their product as Stationary and other calls it Home Office. Further, one abbreviates the attribute of Color as Bl and the other uses Blk. With these inconsistencies, it is impossible to deliver an excellent user experience where this data may need to be displayed.

Bringing it all together with taxonomy and master data management

MDM fixes these inconsistencies by improving data quality . Although each supplier has a way or organizing and describing their products that may or may not be aligned and consistent, the retailer needs to drive a consistent user interface and experience to achieve the best business outcomes. The system needs to have the following characteristics:

  • A centralized repository where "the source of truth" exists
  • Governance processes for fixing inconsistencies or providing feedback to suppliers
  • Rules for automating remediation of predictable inconsistencies
  • Tools for cleansing and normalizing the data (running scripts and converting the data)

The role of a data taxonomy is even more important in multi-domain MDM, which is the direction in which the industry is heading. According to Gartner, 58% of the reference customers in its 2018 Magic Quadrant Report on Master Data Management Solutions are facing the requirement for multi-domain MDM.

Whereas in the past, most MDM systems were focused on a single area such as product data or customer data, more organizations now want to bring data together from multiple domains, to allow for a broader range of business use cases and greater use of analytics. [After reading this, it seemed that it should be made clearer the beginning of the article that “basic” MDM is only integrating one area so I went back and made some suggestions about that.]

In order to conduct analyses across domains and develop effective governance programs, organizations need to set up consistent taxonomies and standard metadata, especially on their critical data. The data models will need to reflect a consistent taxonomy. Ultimately, the relationships among different taxonomies should be captured and documented through an ontology, but having an MDM with appropriate taxonomies is a good foundational step to take.

Nothing about this is easy (or sexy) but it needs to be done if your initiatives are going to make headway. Our team of information science experts can help. Give us a shout if you'd like to talk .


Seth Earley
Seth Earley
Seth Earley is the Founder & CEO of Earley Information Science and the author of the award winning book The AI-Powered Enterprise: Harness the Power of Ontologies to Make Your Business Smarter, Faster, and More Profitable. An expert with 20+ years experience in Knowledge Strategy, Data and Information Architecture, Search-based Applications and Information Findability solutions. He has worked with a diverse roster of Fortune 1000 companies helping them to achieve higher levels of operating performance.

Recent Posts

Conversation with ChatGPT on Enterprise Knowledge Management

In another article, I discussed my research into ChatGPT and the interesting results that it produced depending on the order in which I entered queries. In some cases, the technology seemed to learn from a prior query, in others it did not. In many cases, the results were not factually correct.

The Future of Bots and Digital Transformation – Is ChatGPT a Game Changer?

Digital assistants are taking a larger role in digital transformations. They can improve customer service, providing more convenient and efficient ways for customers to interact with the organization. They can also free up human customer service agents by providing quick and accurate responses to customer inquiries and automating routine tasks, which reduces call center volume. They are available 24/7 and can personalize recommendations and content by taking into consideration role, preferences, interests and behaviors. All of these contribute to improved productivity and efficiency. Right now, bots are only valuable in very narrow use cases and are unable to handle complex tasks. However, the field is rapidly changing and advances in algorithms are having a very significant impact.

[February 15] Demystifying Knowledge Graphs – Applications in Discovery, Compliance and Governance

A knowledge graph is a type of data representation that utilizes a network of interconnected nodes to represent real-world entities and the relationships between them. This makes it an ideal tool for data discovery, compliance, and governance tasks, as it allows users to easily navigate and understand complex data sets. In this webinar, we will demystify knowledge graphs and explore their various applications in data discovery, compliance, and governance. We will begin by discussing the basics of knowledge graphs and how they differ from other data representation methods. Next, we will delve into specific use cases for knowledge graphs in data discovery, such as for exploring and understanding large and complex datasets or for identifying hidden patterns and relationships in data. We will also discuss how knowledge graphs can be used in compliance and governance tasks, such as for tracking changes to data over time or for auditing data to ensure compliance with regulations. Throughout the webinar, we will provide practical examples and case studies to illustrate the benefits of using knowledge graphs in these contexts. Finally, we will cover best practices for implementing and maintaining a knowledge graph, including tips for choosing the right technology and data sources, and strategies for ensuring the accuracy and reliability of the data within the graph. Overall, this webinar will provide an executive level overview of knowledge graphs and their applications in data discovery, compliance, and governance, and will equip attendees with the tools and knowledge they need to successfully implement and utilize knowledge graphs in their own organizations. *Thanks to ChatGPT for help writing this abstract.