The Faceted Fallacy
... If a tree falls in the forest and no one is there to hear, does it make a sound?
Yes I know it’s a silly old question, with no real definitive answers but it makes our brains think creatively about ambiguous problems, which is fun. A thread in the Taxonomy Community of Practice group really got me thinking this way in relation to taxonomy.
To summarize the thread, the question was raised, what are the most commonly used facets in an enterprise taxonomy? In response one member posted a “definitive” list of primary facets that could be used as an exhaustive skeleton for the “enterprise”. From here the conversation split in multiple ways:
- Some people balked at the idea of a definitive set of facets, decrying its rigidity and delving into the classification problems these facets may generate e.g.
- problems of post coordinate application
- ordering complications when creating compound values
- Some took the discussion in more of an applicability direction; discussing how to best construct and apply faceted taxonomy for BI, find ability, content management etc.
- Others mentioned the notion that it is the taxonomy customer’s needs that are the most critical for determining what facets are “required”.
This got me thinking about a very sacrilegious question (sacrilegious atleast if you are a taxonomist): does the notion of primary facets and faceted classification really reflect the way the average person thinks about content?
In one of our current projects we have been working on developing a global taxonomy for a DAM/MRM program. This taxonomy is a big challenge as it needs to account for a huge array of content, in a way that supports multiple work processes, e.g. planning, approval, storage, search and retrieval. Each of these processes is carried out by different functions within the organization and hence by people with different relationships to the content.
Helping to build this taxonomy and subsequently testing it and showing it to people has really driven me to start thinking more about the notion of people’s relationships to content. It’s not something we talk about often when we talk about taxonomy. We talk about functions, roles, and purposes, e.g. “our taxonomy supports faceted navigation”, or “our taxonomy makes content management more powerful”, or “our super pure facets drive the most comprehensive and dynamic BI out there”. These are all good things to be sure, however I often get the feeling that they may mean more to the builders than the users.
To give an example, as taxonomists we often talk about how navigation is not taxonomy and vice versa, but I am not sure that line actually exists in non-taxonomists’ brains. As soon as you ask someone to use a hierarchy only the most schooled taxonomist can maintain this separation. Most people start to make associations based on their relationships with content in determining what should proceed at the next lower levels of navigation, and often it’s different than what a taxonomist might expect. Invariably however the notion of pure facets is thrown out the window.
It takes a certain type of mind and a certain type of training to think about content as being constructed of related but independently pure and mutually exclusive values. Like it or not we do have relationships with content, parts of those relationships are driven by the type of work we do, but an equally big part of it is who we are, and how we make sense of our worlds, which is why it may seem completely awkward for someone to think about the intended audience of a piece of content being separate from what the content actually is. Or why someone may feel that the content type facet is the best place to put the subject values. Why not?
Of course that question is rhetorical if you understand how faceted classification and navigation work. The irony to this line of thought is that a well built faceted taxonomy really does still offer the most flexibility for finding and navigating to content for the widest range of people... with one caveat, you need to either think, or learn to think in a faceted manner to make it work for you.
Technology Has Changed the Methodology But Not the Underlying Paradigm
Theresa Regli at CMS Watch recently posted a follow up blog to her prediction regarding the “Death of Taxonomies”. The central thesis of the post is pretty straight forward: with large amounts of text based content, the process of deriving facets and populating them with values, can be done automatically with entity extraction and semantic/linguistic analysis engines. Not only is the process done equally well by machines now, but it also eliminates the need for time consuming and costly tagging projects. As Theresa explains:
“In an approach similar to Endeca’s, entity extraction and semantic analysis create multi-faceted categorizations by people, country, city, language, companies, and other topics.”
So really what has changed? Because organizing values into facets, like people, language, organizations, etc., whether done by a machine or a person, sounds pretty much like the exact same taxonomy work that has always been done. What we are really saying is that machines can help taxonomists do their work.
As a taxonomist I think this advancement in entity extraction is pretty exciting, and I liken it to our very own industrial revolution. Pretty soon all taxonomists will hopefully have access to their very own horseless carriage.
(turn of the century taxonomists)
All jokes aside though I honestly believe the rapidly improving entity extraction and semantic analysis technology will become part of our tool kit in the coming years, and it will allow us to change the focus and direction of our energies to where they need to go, because as Theresa notes
"Taking taxonomies beyond what technology can achieve on its own is the metadata architect’s challenge for the next decade"
But...
There is Still A Gap And We Need a Bridge
So on to the final point...
“It’s only by studying the “how” of technology just as much as the “what” of the content that we’ll get to the next stage of content management, search, and information access.”
I really like Theresa's sentence but think it is incomplete, so I will finish it with what I think is missing:
“It’s only by studying the “how” of technology, the “what” of the content, and the idiosyncrasies of the human brain when it comes to organizing and conceptualizing content, that we’ll get to the next stage of content management, search, and information access.”
For a while now I think that we as taxonomists, analysts and information specialists, have been trying to solve the information management problem without branching out beyond the study of how technology can manipulate content, and how content can be structured. And since we are making fun sweeping statements, here is mine:
"The next big breakthroughs will need to be fed by a better understanding of what is going on in our brains when we search for, and interact with, content. "
So bringing it back to my original question.
If a tree falls in the forest and no one is there to hear it does it make a sound....? Well here is an answer to that question.
“Sound is a subjective interaction with matter. All that sound is, is vibrations through a medium, without humans to perceive it, those vibrations that we call sound, when the tree fell, would make vibrations, but "sound" as we know it, couldn't exist, since no conscious being was there to interpret those vibrations.”
Let’s look at that answer again, and replace a few words...
“Taxonomy is a subjective interaction with content. All that taxonomy is, is a structure of related terms, without humans to perceive it, those terms that we call taxonomy, when applied to content, would apply classification, but "meaning" as we know it, couldn't exist, since no conscious being was there to interpret the application of those terms.”
Even with the best entity extraction technology and the perfect list of definitive facets, we still need to understand more about ourselves, our brains, our work and our relationships to content. The great unknown is not how many essential facets there are, (we have a pretty good idea of that), or whether computers can build them for us (they can, and do pretty well with text based content and to some extent images, video and audio, and will only get better) but understanding how faceted classification and navigation works in our brains.
The great unknown is actually understanding why you put your subjects in my content types, and more importantly how that can affect the technology we build to help us organize and find it all.