All Posts

Taxonomy in Information Archaeology

Clink, clink went two halves of a Japanese rifle shell case on my researcher's desk at the National Archives and Records Administration facility in College Park, Maryland. They fell from the envelope attached to a memorandum in the folder I took from the large, archival documents box belonging RG-319, Office of Assistant Secretary, Army Staff Operations. The memorandum discussed problems associated with placing Imperial Japanese Army rifles under U.S. Army control back into service as part of the mobilization effort in Japan in response to rising tensions on the Korean Peninsula between 1948 and 1950. It was a good plan except: 1) the parts of the Japanese rifles were hand-crafted by each soldier during final assembly at time of issue and were unique and therefore the guns lacked interchangeable replacement parts; 2) the shells were designed for the gun bore, and 3) the US military had no practical means to mass produce shells for these archaic weapons.

This story illustrates a number of important points. A theory requires supporting knowledge to establish its actual goodness. Knowledge is a work product that moves through an organization. The repository for a work product artifact can be in an unusual place. Navigating to that place requires both an external structure and a diligent, informed seeker. Once accessed, retrieval results may include both target and unanticipated, serendipitous materials.

The business of taxonomy or metadata projects is the successful conversion of data-stuffs into reusable intellectual assets to serve organizational strategic objectives by using knowledge-oriented tactics and tools. In converting data into assets that can be developed or repurposed, an organization creates wealth that enables workers with find goodness in their ideas, decisions, and key relationships, expanding the effectiveness of their endeavors both directly and through diffusion.

Taxonomies are semantic models for governing, interpreting, and maintaining data, and enable solutions to search and resource navigation. As such, taxonomies drive the quality (accuracy, consistency, and saliency) of serendipitous knowledge search and retrieval, and efficiency by making the task of solving problems that demand high-value information easier compared to base-line solutions, enabling measurement in terms of real results.

Information seekers know their needs and wants. Product branding initiatives, professional and shop jargon, and the location of information repositories may make the experience of filling those needs seem similar to dealing with old Japanese rifles. However, taxonomy-driven navigation and search provides standardized, all-purpose reusable semantic bullets: users get meaningful results.

Effective taxonomy work requires learning the vocabulary common to an endeavor and the conceptual relationships among concept terms. This requires more than term capture, it requires mining dialogues and documents for informed points-of-view of creators/seekers and for the organizing structures in their speech. Relationally structuring the language of an endeavor enables smart use of terms, and a pragmatic level of search sensitivity to document nuance and information seeker variations.

Achieving a natural feeling in a highly structured vocabulary requires capturing the actual language of an endeavor and creating bridges across dialects, pidgins, and creoles. Language collection requires both active listening and the use of techniques such as site visits, interviews, and source/text analysis. Giving semantic form to language data requires a level of tooling. Tooling may involve standard approaches to data solicitation, such as card sorting and mental modeling activities, or may require thoughtful analysis of user-task behavior or logical concept-relationships. The goal of data collection and modeling is to discover the semantic categories the words represent, to clarify their logical relationships, and to organize the universe of discourse structurally into an integrated taxonomy or ontology.

One tendency is to think of linguistic taxonomies as being biological taxonomies. However, this is not the case. Linguistic taxonomies are schemes for the flexible representation concepts. Respect for the integrity and independence of major categories is preserved by creating a small number of "facets," such as roles, expertise, or product category distinctions. The facet structure provides a crystalline structure for the universe of discourse. Internally, facet structure is hierarchically, with contextual membership criteria stressing similarities and functions. Semantic facets are akin to the cultural totems, organizing social and cultural relationships about overarching themes, representing deep structures in perspectives and patterns of thought. Investigating a variety of environments and perspectives are essential in taxonomy work for this exact reason. Semantic facets are not Linnaean classes. A Linnaean approach focuses on internal characteristics (such as anatomy, calde, or morphology) are relevant, while the linguistic approach is ecological in its focus: words mean different things in different contexts. Linguistically, the Linnaean approach yields etymological unabridged dictionaries.

Environmental testing is required to confirm the validity taxonomy's facet system. If the facet system presents a complete, high-level framework for an endeavor, it has natural validity. Facet internal consistency is very important and a goal of first importance, but may not be essential. Mathematically, the pairing of completeness and consistency is a fundamental dilemma; pragmatically, speakers naturally speak. Linguistic taxonomies naturally enable speaker effectiveness.

The apparent budget utility of reusing archaic, idiosyncratic rifles to try to defend Japan against aggression from North Korea was unnatural to the U.S. Military Establishment (its then official name), and presented a complex array of challenges. In the end, the documents were buried in hundreds of boxes containing thousands of pages with mixed levels of preservation, some organized by the many individual offices' filing systems, others by a military decimal number system developed many years before, awaiting a persistent researcher: the decision makers selected another, very different solution, but that is another story.

Earley Information Science Team
Earley Information Science Team
We're passionate about enterprise data and love discussing industry knowledge, best practices, and insights. We look forward to hearing from you! Comment below to join the conversation.

Recent Posts

Use Customer and Behavior Data To Create Personalized Experiences

The more quickly customers can find the product they are seeking, the more likely they are to complete a transaction and to return to the site in the future. Personalizing offers and making well- targeted recommendations can bring customers and products together faster, and are effective ways to engage customers by creating a more positive customer experience. In order to do this, companies need to capture and use as much relevant information as possible. The more that is known about the customer, the more effectively the recommendation system works. Customers generate many signals through their online behavior, and those signals can also be used to understand their interests, purchasing patterns, and needs. Reading their digital body language accurately and creating a valid customer model is essential to anticipating and fulfilling those needs.

How to Instrument KPIs Throughout the Customer Journey

You're probably using metrics to determine if your marketing programs are effective. But, have you selected the right metric at each stage of the customer journey?  Which ones connect to your strategic goals? In this session Seth Earley and Allison Brown talk about how each stage of the journey can be instrumented to use feedback from course corrections to further improve the process. You'll learn: Types of operational and user experience metrics and KPI’s How to select and collect the right metric for each stage of the customer journey How KPIs can be used for data-driven decisions How to manage conflicting goals and metrics

First Party Data - Managing and Monetizing the "Data Exhaust" From Your MarTech Stack

Understanding, anticipating and responding to the wants, needs and behaviors of your customer is the competitive battlefield of 2022. However, with new limitations and regulations regarding second and third-party data and tracking cookies, marketers, digital leaders and ecommerce executives have to consider their own methods of collecting and acting upon the data they gather about customers. In this webinar Seth Earley will talk with industry experts about how you need to model, collect, normalize, organize, manage, analyze, and act on customer information. The time to do so is now and we’ll discuss practical ways to move the needle on customer data, customer analytics and orchestration of the customer experience.