Software and technology

Organizing the Unknown – Applying Taxonomy to Discovery through Sophia Search

We all know that taxonomy-based solutions are core and integral to creating business opportunities and solving business problems in our worlds of information.  Yet, there are some areas that taxonomy cannot model, and so seemingly can be of no help whatsoever to some unlucky owners of certain business “problems”.  For instance, organizing what we don’t know.  That seems straightforward and uncontroversial.  Surely we cannot organize what we don’t know?

And yet … undiscovered and/or so-far-unknown relevance hold value for the first discoverers.  There are whole areas of business problems and opportunities hidden in discovery for litigation or in ultra-early stage science and research, to call out just two.  So, logically, if topics and their unknown connections could be discovered on the emergent edge or in massive heterogeneous corpora or joined sets of documents – then taxonomy-based organization could be brought to bear, with all the attendant benefits we know of so well.

I had lunch recently with Jeff Bierach, VP of Sales for Sophia Search (http://www.sophiasearch.com/home), an early-stage start-up with buzz and promising solutions.  Sophia Search – the “Search” part of the name is really a misnomer – do automatic clustering of documents into integral clusters that are relatively free of “noise” (bad cluster inclusion), have clean distinction between clusters and make relevant sense.  And, they seem to do this very well, indeed.

How Xobni Solved Some of My Pet Peeves With Email

I don't usually rave about specific technologies but Xobni is one of those tools that I can no longer work without.  When I teach courses on information access, I tell my students to build functionality that solves critical problems and that becomes a “must have” tool.  Build the things that users will scream about (or at least complain loudly) if you take them away. 

Xobni fits that requirement.  I had to remove Xobni once due to a problem with Outlook and kept missing its ability to find contact information, the contents of email messages, conversations that I had with prospects and colleagues, and, my favorite, the ability to locate messages when you don’t know the format of a person’s email address.

Is SharePoint 2010 the "One"?

I recently pulled out my yellowed copy of Michael Dertouzos’ 1995 What Will Be: How the New World of Information Will Change Our Lives.  What I found interesting is how some of those predictions were spot on and some oddly naïve about just how much humans can change.

In “What Will Be” the term used to describe how people get their jobs done by leveraging various tools for managing documents and information was “Groupwork”.    Today, we simply use content management applications to get our jobs done.    See my recent blog, “This internet thing? It's gonna be BIG!” for more discussion on what will be, what is, and what is to come.

As I looked back over the last 15 years, I thought about the progress made in content management platforms; and the hype that accompanied each one.  “Now, we will we have an end to information chaos! We can control what goes where and enable easy access!”  Sadly, each new offering led to its own flavor of information chaos. 

So is SharePoint 2010 the platform that will solve the problem? Or, will we find that information chaos is migrated along with content?   It’s really up to you and your organization. The opportunity is there but don’t take it for granted.

As I talk to companies and other enterprises, I find that most fall into the same trap – they buy a tool, install it, roll it out and wait for their people to get more efficient and effective.  They wait… and wait… and…  Instead of things getting better, they actually can get worse. 

Why is this, I asked myself.   Here are the five things that came immediately to mind.

Tools for Managing Taxonomies (or Thesauri, or Ontologies)

Taxonomy consultants, such as those at Earley & Associates, may be the ones who develop a taxonomy for an organization, but the organization's own staff will ultimately be responsible for maintaining it, so the question arises what tool or tools should be used the maintain that taxonomy and perhaps further develop it. A taxonomy may be implemented in a CMS, in SharePoint, or with search (Google Search Appliance, FAST, etc.), but these systems do not have taxonomy management components.

An interest in taxonomy tools was evident by the number of chat-based questions that my colleague Seth Maislin and I received from participants in this week’s Taxonomy Community of Practice Call, Cross-Mapping Taxonomies, which we jointly presented. There is a need for tools that do more than merely enabling manual adding and deleting of terms. Mapping two taxonomies is something that only a few tools support, but there are many other day-to-day taxonomy management activities that also require specialized taxonomy management software.

This week several Earley & Associates consultants, including myself, participated in a special training on Smartlogic Ontology Manager, a good example of full-featured taxonomy management software. The question arises:  is this taxonomy management or ontology management software?

If we look at competing software products, we see various designations:

Setting New Objectives: Summary of Taxonomy Bootcamp 2009 Openers & Themes

Subtitle: The Future of Taxonomy... Ad Nauseum

This year's Taxonomy Bootcamp conference was much like years prior: full of great information, knowledgeable speakers, and a ton of self-doubt/-defense/-definition. Which is ironic: professional organizers who struggle to classify themselves. There were at least 3 major sessions dealing with the taxonomist's identity and future (in a 2-day conference with a single track, that's a lot), which left me feeling a bit estranged.

The opening session by Patrick Lambe discussed the identity of the "new taxonomist" in the field, using results from a survey of members of the Taxonomy Community of Practice. His findings were unsurprising to me at least: 

Social Media and the Art of Persuasion

After braving the high winds, rain and herring here today in Aarhus, Denmark, I had the pleasure of sitting in on the J.Boye keynote session by BJ Fogg of the Stanford University Persuasive Technology Lab.

BJ's topic was how social media uses (and we can leverage) persuasion techniques to influence behaviour. As an intermittently avid and lapsed Facebook and Twitter user, most of this talk felt like a session "on the couch" trying to deconstruct why we do what we do...

Triggers, Motivation & Ability

BJ started his talk with the notion of hot vs cold triggers. Hot triggers give users an immediate and obvious call to action (e.g. a sandwich board inviting you to come inside a store to have a coffee for 1$). Cold triggers are calls to action that can't be immediately acted upon (e.g. an advertisement for a movie or play - you have to call or go to a location to buy tickets).

Social media often uses hot triggers, sending you notifications to see people's feeds, see who has friended you, etc. But as BJ explains, triggers are not enough to create behaviours. You also need motivation.

Motivations for behaviour include:

  • Pleasure / Pain
  • Hope / Fear
  • Social Rejection / Acceptance

So, I might decide to get involved in Facebook because I enjoy seeing what my friends from high school look like 15 years later (which counts for both pleasure and pain in many cases), or because I fear being seen as a old fuddy-duddy who doesn't keep up with the times, or because I want to relive the awful dance of social acceptance and rejection from high-scool.... ugh.

Social Tagging - Questions Answered on Correction Tools and Vendors

A few weeks ago, I had the pleasure of giving a presentation on taxonomy vs. folksonomy in the enterprise to the Deloitte Social Tagging & Taxonomy Community of Practice, thanks to an invitation by fellow taxonomy enthusiasts Annie Wang and Lee Romero.

It was a fun presentation (a variation on this talk) and the audience asked some great questions afterwards. I was only able to answer a couple of questions before time ran out, so I offered to answer the rest on my blog. Here are the additional questions & answers:

1. Are there tools for auto-correcting social tags?

I had mentioned the idea that folksonomies are considered to be "self-correcting" or self-tuning - through volume of tags and users, anomalies (like single-use tags, misspellings, etc.) tend to be pushed to the side and the majority will trend towards correct/useful tags.This is an idea that I picked up from a whitepaper on social tagging by Oracle:

"All social input strategies rely on the good-graces of well-intentioned users habituated to provide input over time to succeed...  Social strategies will self-correct for this problem over time under the presumption that more users than not will provide “good” information."

Collaboration, Groove and SharePoint - History Repeating Itself?

I just read that Groove is being renamed as SharePoint Workspace 2010.  For those of you who are not familiar with Groove or its history, I'll take you back to the early 80's. 

Ray Ozzie is the visionary behind Groove and currently the Chief Software Architect at Microsoft (a role he took over from Bill Gates).  At University of Illinois (as many know, home to the NCSA  which created Mozilla, the first web browser on which Internet Explorer is based) Ozzie worked early iterations of some of today's knowledge management,  collaboration and social media applications (discussion forums, message boards, e - learning, e-mail, chat rooms, instant messaging, remote screen sharing, and multi-player games.

OASIS Approves UIMA - the first standard for accessing Unstructured Information

Oasis

Early last month, OASIS announced the approval of the Unstructured Information Management Architecture Version 1.0.  This standard creates an open method for accessing unstructured information - that is, any information that is created by and for people, and is not inherently machine-readable (e.g., not data).  UIMA can potentially become very important since it provides a standard mechanism to exchange metadata for all types of unstructured content - documents, web pages, email, voice, images and video.

As we all have heard repeated in the marketing messages of every content-related software company, over 80% of the data we run our businesses on is unstructured.  In our business we help our clients tame their mountains of content by classifying it.  Often we rely on technologies like auto-classification, entity extraction, and other analytics to tag content with metadata.  Metadata helps us bring structure - and in turn semantics or meaning - to unstructured content. 

Of course, each of these systems has its own API and its own methods of expressing the metadata it produces or consumes.  This is where UIMA comes in.  In the introduction to the UIMA standard, the team at OASIS describes a typical workflow in which various analytics packages may need to interact:

MOSS 2007 Requirements Gathering: Fast and Focused

Since Microsoft Office SharePoint Server is a mature platform for collaboration, content management and portals, companies can implement the package without much planning or even requirements gathering. Too often, the IT department is assigned the task of technically implementing SharePoint, with little context for its use or its potential value to the organization. The individuals in Business Units or Departments, who will use the system, are kept in the dark about the plans and the functionality of SharePoint. Once IT is satisfied that MOSS is technically stable, it rolls the package out to users with little training or follow-up. This approach rarely succeeds.