Contact Us|Search:  
Earley & Associates

Search and Taxonomy - Leveraging Metadata to Return Content in Context (2)

Building a Taxonomy and Tagging Content

Building a taxonomy and tagging content can both be laborious tasks, but the benefits and ROI, as well will see, more than justify the effort. Thankfully both of these processes can be made significantly easier with the help of specialized software applications such as Wordmap , Nstein, Teregram, etc.

Choosing Terms

At its core, a taxonomy is a selection of controlled terms that represent an organizations’ content or subject domain in a hierarchical structure. When building a taxonomy, choosing the terms to represent the content is an important first step. Earlier we mentioned that one of the benefits of taxonomy is that it can allow access to content from different user perspectives. This is done by ensuring that variant terms and synonyms are captured and included in your taxonomy.  E.g

Figure 1

(Figure 1: An example of variant and multi-language terms)

We explain later how these terms are leveraged by search in the information retrieval stage.

Hierarchical Relationships & See-also Relationships

The next step in building a taxonomy is to define the relationships between the terms you have chosen to represent your content. The basic organizing principal of a taxonomy is the hierarchical parent-child relationship. This relationship always moves from broader to narrower. In hierarchical relationships the child terms always inherits the characteristics of the parent terms.

Figure 2

(Figure 2: Hierarchical relationship)

Hierarchical relationships are natural ordering principals that mirror the way we naturally think about content.  We know that a laptop is a type of computer, and that a desktop is a type of computer and vice versa that computers can be either laptops or desktops.
 
An associative (or “see-also”) relationship defines a non hierarchical relationship. This type of relationship can be used to relate terms that are conceptually linked within your organization but are not hierarchically related.   

Figure 3

(Figure 3: Associative relationship)

Associative relationships can be customized in any way, so that terms relate each other. For example, the relationship between “Computers” and “Computer Monitors” can be “used with”.

Tagging

The next step in integrating taxonomy with search is to tag content with terms from the taxonomy. This process applies an extra layer of controlled metadata to an organization’s content that can then be leveraged by a search application. In organizations with large amounts of content, tagging is often done with the help of automatic indexing software.  It is also important to note that not every piece of content needs to be tagged. Some content benefits from extensive tagging more than others, an organization must determine what content is of highest value to its users and is essential to daily workflow.

Leveraging Taxonomy & Metadata for Search

Once content has been tagged a search application can leverage the taxonomy categories and improve both the recall and precision of a search.

Improved Recall: Multiple User Access Points

 Recall can be defined as the ability of a search engine to retrieve all content associated with a query. Earlier I mentioned the importance of including variant terms and synonyms when choosing the controlled vocabulary that will populate your taxonomy. This is because these terms can be leveraged by a search application, essentially allowing users to search for content with a wider vocabulary and improving the recall of search. Consider the following example:

Scenario A (Full text search)

 An employee working for an electronics company wants to find information on all of the laptop computers that they sell. The user enters the query term “notebooks” into the search engine. The search engine runs the query against a database of product information documents and returns hits that contain the word “notebooks”. The biggest problem with this scenario is that the employee may not be aware that many of the product information sheets use the term “laptops” instead of “notebooks”. A full text search does not know that laptops and notebooks represent the same product and the employee will be presented with an incomplete set of results, and recall is decreased.

Scenario B (Search with Taxonomy)

 The employee enters the same query, however this time the query is also matched against the taxonomy. In our taxonomy the controlled term is “laptops”, however “notebooks” is included as a synonym for that term. The search engine will then return all the content tagged with the term “notebooks” as well as laptops because the term has been included as a synonym. This illustrates how search with a taxonomy can increase recall of important information.

Improved Precision: Leveraging Taxonomy Relationships 
 
Precision can be defined as the ability of a search engine to retrieve the most relevant information associated with a query. Hierarchical and associative relationships are powerful mechanisms for search to leverage in this regard.  Consider the following example:

An employee is looking specifically for information on desktop computers, but enters the search term “computers”. A full text search will retrieve all documents containing that query term and present the user with a long unmanageable list.  Search integrated with a taxonomy can display the relationships present in the taxonomy as well as documents tagged with the taxonomy term. This allows the search interface to present the user with two options: (1) narrowing their search based on the hierarchical relationships; by looking at laptops or desktops or (2) view associated content by looking at related taxonomy terms such as monitors. e.g.

Surfacing the taxonomy in this user friendly way has the added benefit of familiarizing users with the how the content they are looking for is understood and classified by the organization.

Taxonomy term relations can also help to solve the problems created by ambiguous search terms.  Consider the problem of determining whether the query term “notebook” refers to a paper product or a computer? As we discussed earlier, a full text search engine can’t tell the difference, however this problem can be solved by surfacing the taxonomy categories related to the query in the search interface. 

The scenario described above is often also referred to as guided navigation.

Content in Context

 The previous examples illustrate how integrating search with a taxonomy is able to present an employee or customer with content in context. Search as an application can not just be added on top of existing information environment and be expected to perform miracles. Integrating search with a well constructed taxonomy and properly tagged content can make content significantly more findable. Users will be able to look for content using their own terminology; they will be able narrow and broaden their searches according to their information needs, and can be guided to clarify these needs using the taxonomy.  As the taxonomy is surfaced to users in search results, users will also begin to understand how information is organized in the information environment. This has the powerful effect of educating and empowering the information seeker. The culmination of all of these benefits is that information is not only more findable, but presented in a way that maintains its organizational context; truly offering the user content in context.

1. Susannah Fox, “Search Engines: A PEW Internet Project Data memo” <http://www.pewinternet.org/PPF/r/64/report_display.asp>
2..Elizabeth Liddy “How a Search Engine Works” <http://www.infotoday.com/searcher/may01/liddy.htm>
3. Ryen W. White, Joemon M. Jose and Ian Ruthven <http://research.microsoft.com/~ryenw/papers/WhiteCONTEXT2002.pdf>

For more information on how Earley & Associates can help you make the business case for your enterprise taxonomy, contact us.

Back to Part 1 > Search

© 2008 Earley & Associates, Inc.


Taxonomy & Metadata

Search

Content Management

Digital Asset Management

Usability Testing

Training & workshops

Case studies

Past clients

Speaking engagements

Past conferences

Presentation abstracts

Taxonomy Community of Practice Series

No Cost Jumpstart Series

Other sessions

Articles & reports

Audio & video presentations

Web resources

Blog

About Earley and Associates

Careers with Earley & Associates

Contact us

News