Expert Insights | Earley Information Science

Taxonomy in Information Archaeology

Written by Earley Information Science Team | Jul 8, 2010 4:00:00 AM

Clink, clink went two halves of a Japanese rifle shell case on my researcher's desk at the National Archives and Records Administration facility in College Park, Maryland. They fell from the envelope attached to a memorandum in the folder I took from the large, archival documents box belonging RG-319, Office of Assistant Secretary, Army Staff Operations. The memorandum discussed problems associated with placing Imperial Japanese Army rifles under U.S. Army control back into service as part of the mobilization effort in Japan in response to rising tensions on the Korean Peninsula between 1948 and 1950. It was a good plan except: 1) the parts of the Japanese rifles were hand-crafted by each soldier during final assembly at time of issue and were unique and therefore the guns lacked interchangeable replacement parts; 2) the shells were designed for the gun bore, and 3) the US military had no practical means to mass produce shells for these archaic weapons.

This story illustrates a number of important points. A theory requires supporting knowledge to establish its actual goodness. Knowledge is a work product that moves through an organization. The repository for a work product artifact can be in an unusual place. Navigating to that place requires both an external structure and a diligent, informed seeker. Once accessed, retrieval results may include both target and unanticipated, serendipitous materials.

The business of taxonomy or metadata projects is the successful conversion of data-stuffs into reusable intellectual assets to serve organizational strategic objectives by using knowledge-oriented tactics and tools. In converting data into assets that can be developed or repurposed, an organization creates wealth that enables workers with find goodness in their ideas, decisions, and key relationships, expanding the effectiveness of their endeavors both directly and through diffusion.

Taxonomies are semantic models for governing, interpreting, and maintaining data, and enable solutions to search and resource navigation. As such, taxonomies drive the quality (accuracy, consistency, and saliency) of serendipitous knowledge search and retrieval, and efficiency by making the task of solving problems that demand high-value information easier compared to base-line solutions, enabling measurement in terms of real results.

Information seekers know their needs and wants. Product branding initiatives, professional and shop jargon, and the location of information repositories may make the experience of filling those needs seem similar to dealing with old Japanese rifles. However, taxonomy-driven navigation and search provides standardized, all-purpose reusable semantic bullets: users get meaningful results.

Effective taxonomy work requires learning the vocabulary common to an endeavor and the conceptual relationships among concept terms. This requires more than term capture, it requires mining dialogues and documents for informed points-of-view of creators/seekers and for the organizing structures in their speech. Relationally structuring the language of an endeavor enables smart use of terms, and a pragmatic level of search sensitivity to document nuance and information seeker variations.

Achieving a natural feeling in a highly structured vocabulary requires capturing the actual language of an endeavor and creating bridges across dialects, pidgins, and creoles. Language collection requires both active listening and the use of techniques such as site visits, interviews, and source/text analysis. Giving semantic form to language data requires a level of tooling. Tooling may involve standard approaches to data solicitation, such as card sorting and mental modeling activities, or may require thoughtful analysis of user-task behavior or logical concept-relationships. The goal of data collection and modeling is to discover the semantic categories the words represent, to clarify their logical relationships, and to organize the universe of discourse structurally into an integrated taxonomy or ontology.

One tendency is to think of linguistic taxonomies as being biological taxonomies. However, this is not the case. Linguistic taxonomies are schemes for the flexible representation concepts. Respect for the integrity and independence of major categories is preserved by creating a small number of "facets," such as roles, expertise, or product category distinctions. The facet structure provides a crystalline structure for the universe of discourse. Internally, facet structure is hierarchically, with contextual membership criteria stressing similarities and functions. Semantic facets are akin to the cultural totems, organizing social and cultural relationships about overarching themes, representing deep structures in perspectives and patterns of thought. Investigating a variety of environments and perspectives are essential in taxonomy work for this exact reason. Semantic facets are not Linnaean classes. A Linnaean approach focuses on internal characteristics (such as anatomy, calde, or morphology) are relevant, while the linguistic approach is ecological in its focus: words mean different things in different contexts. Linguistically, the Linnaean approach yields etymological unabridged dictionaries.

Environmental testing is required to confirm the validity taxonomy's facet system. If the facet system presents a complete, high-level framework for an endeavor, it has natural validity. Facet internal consistency is very important and a goal of first importance, but may not be essential. Mathematically, the pairing of completeness and consistency is a fundamental dilemma; pragmatically, speakers naturally speak. Linguistic taxonomies naturally enable speaker effectiveness.

The apparent budget utility of reusing archaic, idiosyncratic rifles to try to defend Japan against aggression from North Korea was unnatural to the U.S. Military Establishment (its then official name), and presented a complex array of challenges. In the end, the documents were buried in hundreds of boxes containing thousands of pages with mixed levels of preservation, some organized by the many individual offices' filing systems, others by a military decimal number system developed many years before, awaiting a persistent researcher: the decision makers selected another, very different solution, but that is another story.