Cost: $50.00
Seth began by going over some of the basics of taxonomies and metadata (including definitions) and stressing the importance of metadata in creating a common language between multiple applications. He also brought up the issue of information vs. semantic architecture:
- Information architecture describes the ways in which systems capture, manage, organize and present information
- Semantic architecture is about meaning and nuance
Structural metadata deals more with information architecture, while semantic or taxonomic metadata deals with semantic relationships to disambiguate meaning.
Seth went over the concept of validation lists – the set of accepted terms in a metadata field - from which taxonomies can be created or informed, or which taxonomies can produce.
Finally, he presented a template (from Marcia Morante at KCurve) that allows for rigorous metadata planning and tracking, which includes fields such as "Metadata element name", "Description", and "Controlled vocabulary source" – useful for tracking what taxonomy or source vocabulary is being used to populate the list of terms used in that field.
Consumer Taxonomies within the Enterprise Metadata Environment
Presented by R. Todd Stephens
Todd's presentation dealt with the question of integrating consumer taxonomies or data from consumer-based information with producer or broker-produced taxonomies in the enterprise environment.
First, Todd presented his view of enterprise metadata architecture, including layers of structural metadata (between the assets and repositories), integration metadata and semantic metadata. He then explained the use of an "asset portal" where one can select different views of data and create collections based on the selected view – or metadata element. For example, selecting "web services" as the asset type would create a collection of web services.
Todd then moved on to his main focus – the consumer-based taxonomy. The three roles within metadata, according to Todd, are:
- Asset producers (who provide raw metadata and documentation)
- Asset brokers (who produce products – such as repositories, classifications, etc., and services – such as impact analysis and integration)
- Asset consumers (who use the assets and are concerned with usability and value)
Taxonomies are mainly built by producers and brokers, he asserted, and then confirmed by user studies. Todd argued that taxonomies should be further driven and informed by consumer-based information, such as user metrics and search/path analysis. Using this information, the taxonomy can be improved to better reflect consumer search behaviour. Todd gave the example of how user taxonomies affected classification on his blog site. Using web trends and other user metrics, he found out that his classification scheme was too complicated and affected the reach of his blog. By changing his taxonomy to reflect consumer's actual search patterns, he vastly improved his visibility and popularity.
Todd then briefly discussed folksonomies, adding that while they work well for large populations that tag constantly, their use in corporate situations is still an area that requires research.
His summary:
- Consumer based taxonomies are an excellent solution to the old problem of classifying large collections of assets. Integrating consumer based technologies is a straight forward process which indicates that the base cost is fairly low and the benefits have the potential to be enormous
- Continue with the Producer and Broker Taxonomies
- Integrate the Usage based Consumer Classifications
- Research the Folksonomies Area and other emerging classification technologies
Metadata Standards, Taxonomies, and Information Quality
Presented by Danette McGillivray, of Granite Falls Consulting Inc. & former data quality expert at Hewlitt Packard
Danette started off with an interesting simile, comparing disorganized repositories to an ipod shuffle, where you can only access information randomly and cannot search for a specific item. Metadata, of course, is the way to "avoid the shuffle".
She presented a model with three types of data:
- Master data: static data records
- Transactional data: dynamic data records
- Metadata: data about the data
Metadata includes what is known as "reference data", which is setup data such as value sets, profile parameters, codes and descriptions, etc. (Reference data: "Any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise" – Malcolm Chisolm).
Danette then addressed the idea of information quality and its importance in metadata. All metadata must be managed effectively, and this includes ensuring information integrity and managing all stages of the information lifecycle.
She proposed an information integrity framework – a logical structure to help understand the complex information environment in an enterprise. This framework includes business goals, life cycle stages, and key components (data, people, technology, processes). Other factors that affect quality include responsibility, requirements, structure and meaning, communication and change.
Responsibility, or governance and stewardship, composed the focus of the rest of her presentation. Essentially, she spoke about determining what actors need to be involved in the creation of metadata standards (such as data stewards, process owners, etc.) and how to motivate them to collaborate, as well as good work practices for setting and maintaining these standards. Danette also provided helpful templates to manage this selection of individuals to participate in metadata initiatives, as well as visualize the information integrity framework.


