Blogs

Academy of Motion Picture Arts and Sciences Metadata Symposium

I just finished moderating the Digital Motion Picture Metadata Symposium at AMPAS.  The day covered all aspects of metadata from pre-production through production, post, distribution and archiving.

We had presenters from Pixar, Sony Pictures, the Academy of Motion Picture Arts and Sciences, Marvel Studios,  Warner Brothers, CNRI, Gracenote and the Library of Congress.

We saw examples from productions including The Incredibles, Wall-e, The Curious Case of Benjamin Button, Syriana, Ocean's Eleven and others.

The day was packed with presentations that addressed all aspects of the metadata lifecycle for Digital Motion Pictures.

Digital Motion Picture Metadata Lifecycle

Digital Motion Picture Metadata Lifecycle AMPAS Symposium

Podcast and session summaries coming soon.

Podcast on Folksonomy & Taxonomy in the Enterprise

I had the great pleasure of doing a podcast a few weeks ago with Paul Miller, podcaster for Nodalities (magazine & blog), on hybrid approaches to folksonomy and taxonomy and their role in the enterprise.

We discussed the now tired debate of folksonomy vs. taxonomy, and focused on the strengths and applications of each approach. I covered how organizations are leveraging social tagging and what some of the pitfalls are in the enterprise context.

I also talk a lot a few of the hybrid approaches to taxonomy & folksonomy:

  • Co-existence
  • Tag-influenced taxonomy
  • Taxonomy-influenced tagging
  • Tag hierarchies

Forest for the Trees: How Taxonomy Design Is Like Systems Engineering

Thanks to my wife, I've been learning a little bit about systems engineering, a form of engineering that addresses the complex interactions of multiple systems. As she says, you need to consider systems engineering when the interrelationships between systems are as complicated as the systems themselves. For example, to reduce automotive traffic you need to research social behavior, road design, business, and the environment. To study ergonomics you need to study the human body, computer design, application design, and user efficiency needs. And don't get me started on the U.S. healthcare system.

The very first step of systems engineering is to understand the full scope. Ground transportation isn't about cars and trains, for example, but about the entire surface of the earth: population clusters, topography, climate, and distances. Taxonomy starts this way too, with facets like people, documentation types, product lines, and access levels.

Can't it just be like Google?

I often get frustrated by those who think Google is the greatest search engine that ever parsed. Don't get me wrong - I like Google, I use Google, I employ it as a verb. But if I hit the search button and get wonky results, I recognize that they are wonky and am not afraid to blame Google. (Full disclosure: I have a library science background which I'd like to think has made me into a pretty good searcher, so I will usually try a few different queries before I point the finger at the machine.)

On most of our consulting engagements, at least one person will say "I want our search to be more like Google." I have a few problems with this kind of statement. Partly it's that most folks aren't terribly critical when it comes to evaluating the relevance of Google results. It's what we know, it's what we're used to. We don't mind that Wikipedia is almost always the first result on any query - many might find that a "feature".  We're generally happy to take whatever shows up in those top 10 results and roll with it regardless of what it is, mostly because we can't know everything that is out there so we trust Google to filter it for us. We satisfice (yes, it's Wikipedia, joke intended.)

Collaboration, Groove and SharePoint - History Repeating Itself?

I just read that Groove is being renamed as SharePoint Workspace 2010.  For those of you who are not familiar with Groove or its history, I'll take you back to the early 80's. 

Ray Ozzie is the visionary behind Groove and currently the Chief Software Architect at Microsoft (a role he took over from Bill Gates).  At University of Illinois (as many know, home to the NCSA  which created Mozilla, the first web browser on which Internet Explorer is based) Ozzie worked early iterations of some of today's knowledge management,  collaboration and social media applications (discussion forums, message boards, e - learning, e-mail, chat rooms, instant messaging, remote screen sharing, and multi-player games.

ZigTag Finally Launches Semantic Bookmarking

So, I seem to have not been on the right RSS feed, because I totally missed the memo that ZigTag finally launched at the end of 2008.  I had signed up for the restricted Beta some time ago (there were 500 or so participants), and was awaiting the live version anxiously. ZigTag is a tagging/bookmarking tool that uses "defined" tags, whereby users choose from a controlled set of tags (through auto-complete) with semantic distinctions managed in a knowledge base.

For example, if you start typing in "Ital...", it will start populating a drop-down of choices asking you if you mean, Ital (Rastafarian food), Italy (the country), Italian (Culture of Italy), etc.  If there are multiple versions of one word (synonyms), they use parenthetical qualifiers to define them. Hovering over a term also brings up definitions (brought in from Wikipedia).

ZigTag Screen Shot

I think this tool is a great example of a hybrid between taxonomy and folksonomy... or even between ontology and folksonomy. We are able to eliminate many of the ptifalls of social tagging, such as:

What A Cute Bunny: Taxonomy as Liberator

I spent this past week testing a taxonomy as part of a digital asset management project we are currently working on. One of the test scenarios involved giving art taggers a series of images and asking them to code them using the taxonomy we had developed.

Taggers see taxonomy as a blessing and a curse. On the one hand controlled vocabularies are a tagger's dream; a nice list of consistent terms that alleviate the problems of free-tagging (e.g. five variations on the same term, plural vs. singular, spelling mistakes, etc.) However, these same vocabularies quickly become a tagger's nightmare when they perceive the values to overlap or be ambiguous - especially if you are used to only being able to select one value from the list.

OASIS Approves UIMA - the first standard for accessing Unstructured Information

Oasis

Early last month, OASIS announced the approval of the Unstructured Information Management Architecture Version 1.0.  This standard creates an open method for accessing unstructured information - that is, any information that is created by and for people, and is not inherently machine-readable (e.g., not data).  UIMA can potentially become very important since it provides a standard mechanism to exchange metadata for all types of unstructured content - documents, web pages, email, voice, images and video.

As we all have heard repeated in the marketing messages of every content-related software company, over 80% of the data we run our businesses on is unstructured.  In our business we help our clients tame their mountains of content by classifying it.  Often we rely on technologies like auto-classification, entity extraction, and other analytics to tag content with metadata.  Metadata helps us bring structure - and in turn semantics or meaning - to unstructured content. 

Of course, each of these systems has its own API and its own methods of expressing the metadata it produces or consumes.  This is where UIMA comes in.  In the introduction to the UIMA standard, the team at OASIS describes a typical workflow in which various analytics packages may need to interact:

Electronic Medical Records: New Turf for Taxonomists?

Electronic Medical Records (EMR) have been receiving a good deal of attention of late. And it is no wonder. Amongst the challenges present in healthcare, both in the U.S.A. and globally, the fact that medical records largely consist of paper files certainly gives us pause. But what, exactly, are the goals of the much talked about EMR initiatives? And, are the approaches being discussed likely to meet those goals? Further, why am I writing about this issue on a blog that is about taxonomies, content management, and so on? Let us look at this a bit more carefully, as I think the connection to taxonomies and the like will become quite clear.

MOSS 2007 Requirements Gathering: Fast and Focused

Since Microsoft Office SharePoint Server is a mature platform for collaboration, content management and portals, companies can implement the package without much planning or even requirements gathering. Too often, the IT department is assigned the task of technically implementing SharePoint, with little context for its use or its potential value to the organization. The individuals in Business Units or Departments, who will use the system, are kept in the dark about the plans and the functionality of SharePoint. Once IT is satisfied that MOSS is technically stable, it rolls the package out to users with little training or follow-up. This approach rarely succeeds.