Enterprise Search & Why We Can't Just Get Google

In this article (originally published via CMSWire) we examine the desire to duplicate the Google experience in the enterprise by attempting to change our perspective on what we expect from enterprise search based on what we’re willing to do to make it work. 

Search is an incredibly interesting problem, one that’s so complex in the background yet so simple on the surface. What could be easier than entering a few keywords into a single text box and, in a fraction of a second, being granted access to tens or even hundreds of millions of relevant resources - all the information we could ever really want right at the tips of our fingers. For most, this is perceived as a near perfect user experience that is today’s reality when we search online using Google. 
 
Yet a common complaint heard internally across many organizations is the inability to easily find the right answer amongst a set of far fewer, and often less relevant results.  The organization of content into intuitive information architectures is a challenging problem, and the creation of navigational constructs that classify information into meaningful categories is becoming increasingly difficult due to the sheer volume of content being produced.  The user experience is increasingly becoming both complicated and fragmented and is placing a greater emphasis on search as the silver bullet. Unfortunately, search too is failing to meet the needs of our users and is oftentimes perceived as nothing more than “a random document generator”, as one client has colorfully put it. 

Why Can’t We Just Get Google?

Interestingly, this is a common question asked during many of our intranet redesign initiatives. We hear it from end-users on the frontlines all the way up to senior level management, executives and everywhere in between. But before we look for an answer, let’s take a step back and build a bit of a foundation by examining some of the fundamentals of web search itself. 
 
As human visitors to a web page, we expect to see a variety of visual cues embedded within the interface. Graphic design, eye-catching imagery and the logical layout of content are all elements that appeal to us as we interact with a site and its content. In their absence, the site’s ability to keep us engaged dramatically diminishes and we quickly lose interest. In contrast, the search engine’s experience of the same page is purely textual, ignoring most if not all of the parts that draw us in. To illustrate, let’s take a look at the difference between what a visitor sees versus what Google sees:
 
Visitor View:
 
View of a webpage from the perspective of a human visitor
 
Search Crawler View (text-only cached version of this page in Google)
 
View of a webpage from the perspective of a search engine
 
Google’s crawler, the Googlebot, has the primary function of finding, consuming and indexing content from across the web. But it doesn’t stop there. More attributes are taken into consideration in order to determine relevancy and display of the appropriate search results for a particular query. A short video from Matt Cutts, Principal Engineer at Google, shows at a high level how the search engine locates, indexes and ranks web documents[1]:
 
 
Essentially, Google traverses the internet by following links and consuming the content it finds along the way. It attempts to implicitly derive the meaning of a document based on the document’s content by examining terminology and, to put all that text into context, uses signals like the occurrence of words and the relative value of those occurrences, including positioning, weight and semantic relationships to infer relevancy. It does so by asking questions, “more than 200 of them”, of the document itself, as well as the document’s context within the larger corpus of indexed content. 
 
The algorithms underlying the technology are comprised of complex mathematical formulae that have been continuously evolving over the past decade in an effort to solve problems specific to the indexation and ranking of web content. It has quite often been altered to reflect actions taken by unscrupulous webmasters who have attempted to game the system by exploiting unaddressed gaps in an effort to obtain high search rankings. To improve both quality and relevance of the search results, more than 200 unique attributes are captured and applied to indexed documents. We can think about the automated application of each of these signals as Google’s approach to content enrichment. 
 
While these attributes form the foundation of Google’s secret sauce, numerous experts in the industry have made attempts to determine what many of them actually are. Creating a text cloud of the results of their analysis[2][3] offers the following high level insight:
 
 
On the web, domain factors and keyword use in page text (properties) as well as in internal and external links comprise some of the more important attributes when it comes to assigning relevancy in internet search.

Ambiguity, Intent and Encouraging Conversation

If we take a look at another piece of the puzzle we see that further complicating the problem of search are the searchers themselves. Search queries are often ambiguous and generally do not express exactly what it is that the searcher is actually after. Even the best search engines in the world, including Google, are unable to resolve intent based on the entry of a few keywords into a simple text box. Take for example the query term “twister”, which has roughly 550,000[4] searches per month. It’s virtually impossible for the search engine to understand the intent of the searcher that enters this query. Is the person looking for information about…
  • The 1960's game from Milton-Bradley?
  • The 1996 movie starring Helen Hunt and Bill Paxton?
  • Helping to understand the scientific nature of tornadoes?
  • Maintenance tips for a Honda Twister 250 sport bike?
  • A promotion offered by the radio station KTST-FM 101.9 the Twister?
  • A tongue twister the searcher used to know as a kid but has since forgotten?
The primary challenge lies in the search engine’s ability to extract context based on innovative interaction. Different people entering the same query term might in fact be searching for different things altogether, so to arrive at an answer the search engine must begin a process of disambiguation. Sometimes it occurs during query construction through auto-complete or keyword suggestion:
 
 
While other times it comes in the form of related searches:
The concept of universal search is also used to integrate potentially relevant content from shopping, image, video, news and social sites like Twitter. If none of these approaches offer the insight required, the searcher often then refines the original query by entering additional keywords that provide further clarity. This back and forth interaction effectively becomes a conversation between the technology and the searcher - an iterative process required to connect the person searching with the most appropriate search result. 
 
In addition, the simplicity of the Google Experience has also led to incredibly high expectation. The company’s philosophy on designing for web search takes into consideration the whole[5] problem of search, meaning that:
  • If users can’t spell, it’s our problem.
  • If they don’t know the syntax of search, it’s our problem.
  • If there is not enough content, it’s our problem.
  • If they can’t speak the language, it’s our problem.
  • If the web is too slow, it’s our problem.
We can learn a lot from these statements, but what it really boils down to is the improvement of information access through the mechanism of enterprise search within our organizations is our problem and not the problem of our users. In some ways this has made the design and implementation of enterprise search initiatives significantly more challenging. We often look for the easy solution since the Google Experience has taught us to expect simplicity. 
 
But, we need to keep in mind that Google as a company employs thousands of very talented and intellectual people working toward solving the single problem of search. Admittedly, web search is a problem that is far from being solved, and even the world’s leading search company believes there’s still much work that needs to be done. In a recent blog post taken from the L.A. Times, Gabriel Stricker, Google’s Director of Global Communications and Public Affairs stated, "Search is at the heart of everything we do, and as we've said many times, it's still an unsolved challenge."[6]
 
As purveyors of search technology and functionality for our organizations, we can no longer approach search as just an application that is plugged in and turned on. To be successful, we must take a look at the problem from a different perspective and begin to view at it as more of an experience, one that’s constantly evolving and is as unique as the people for whom it’s intended. 

Shifting Perspectives, Moving from Appliance to Experience

Within the enterprise, we can see that organizational content is frequently structured quite differently than its web content counterpart. Often, a single document such as a policy, standard operating procedure or corporate guideline will be comprised of tens if not hundreds of pages along with a multitude of topics. As a result, the automated extraction and indexation of all that text tends to provide less insight into what that document is really about. While a variety of those 200 plus signals might still apply in one form or another behind the firewall, many of the more important attributes influencing algorithmic relevancy on the web, such as targeted keyword use and a complex interlinking between information assets, are rare. This means the process of automated metadata inference in this manner becomes less powerful overall. 
 
As a result, we need to change our perspective on what we expect from enterprise search based on what we’re willing to do to make it work. This means taking a closer look into redesigning the overall experience to move away from an emphasis on full-text indexing and toward ways that not only provide direct access to the answer, but also promote discovery, exploration and raise awareness. Enterprise search should in fact be more relevant inside the organization primarily because we have greater control over both the inputs and outputs required to make it work. If we start thinking about how best to facilitate the conversation by taking a more active role in understanding our content and its structure, along with our users and their access needs, we will be better able to design and deliver a simpler, more effective and relevant search experience. 

Search Query Disambiguation through Faceted Refinement

Our existing enterprise search tools (out of the box) place little emphasis on promoting conversation through the process of disambiguation. Facilities inherent within the technology that do so are commonly not configured properly, don’t have the appropriate inputs available or are turned off altogether. As a result, full-text indexing along with the document title, short snippet and ten results per page become the common default experience.  
 
One approach we’re beginning to see more of in an effort to improve enterprise search comes to us in the form of faceted refinement.  The introduction of facets to the search interface provides the ability to easily refine a result set based on the unique properties of the result set itself. Known as faceted search or guided navigation, it provides for the categorization of search results based on metadata attributes, along with the numerical distribution of those results across available values. Like the approaches to disambiguation mentioned earlier, this type of advanced search helps guide searchers down the appropriate path by providing a variety of predefined options for discovery, rather than a relying on the searcher to know exactly what they’re looking for in advance.  
 
However, unlike the typical ecommerce experience where product attributes such as size, color and price inherently become the basis for facet development, the ability to succinctly describe organizational content in the same manner is a greater challenge. The establishment of meaningful dimensions is a subjective exercise in defining both the “is-ness” of our content as well as its “about-ness”, or how we wish to describe it. A well thought out and intuitively designed metadata schema and controlled vocabulary are the foundation to a successful faceted search experience. 
 
Taxonomy defines consistent organizing principles for enterprise information based on the language of our business users, and for this to be effective, it’s crucial that we engage subject matter experts in the process since they understand the content best.  What this means is that the business must also be engaged and responsible for the search experience and can no longer think of it as just a function of IT. If we’re constantly generating and feeding poorly formatted content into our search tools, our only expectation can be to get the same back when searching. 

6 Steps to Enterprise Search Improvement

The following activities are essential elements to the construction of a solid foundation for any enterprise search initiative. 
  1. Perform Content Analysis - Evaluate the environment to determine the types of documents produced and their overall importance along with identification of unique metadata attributes. This exercise, while important for search, is also the foundation for building successful information architectures.
    • What types of content do we have? Of those identified, which have organizational value? What makes them unique? Who has ownership responsibility? How is their lifecycle managed?
  2. Understand the Audience - Understand the range of user types we have and what their individual content requirements are. This becomes the basis for a more personalized experience throughout the intranet as a whole and likely includes definition by location, department, job role, title and/or group.
    • Who do we serve and what are their needs?
  3. Create A Content Enrichment Approach - Develop a consistent set of organizing principles in the form of taxonomy and controlled vocabulary that will be applied to unstructured content (either manually or automatically) during the authoring, approval, publishing or consumption processes.
    • What are the underlying organizing principles? How do our users think about the types of content we’ve identified? How can we tag them in ways that best describe what they are and what they’re about?
  4. Develop Search Scenarios - Capture, document and design information access scenarios in the form of use cases based on an understanding of the unique needs of our users. This includes search interfaces and consideration of points of interaction including desktop, mobile and application search.
    • How do our users want to search? From where? What does the interaction look like? How and where can we leverage search and taxonomy to display relevant content? What approaches to disambiguation can we take? Are search needs different among different processes or workflow stages?
  5. Evaluate Tools & Technologies - Make full use of functionality available by understanding how best to configure the technology that has been procured. Slight adjustments to out of the box settings including metadata mapping, best bets and thesaurus management can often make a significant difference in improving overall relevance.
    • What can we do with what we have? What functions are available to us? What do we need to do to make them work?
  6. Design Governance - Lastly, but most important is the establishment of standard processes for review, maintenance and enhancement along with identification of specific roles and responsibilities around content management, search and taxonomy. This includes shared responsibility across departments in addition to subject matter expert engagement. Further development of editorial guidelines, naming conventions, templates, workflow and standard publishing models will ultimately move us closer to our overall goals. 
Once we’ve made our way through these activities and our content is being appropriately enriched, we can start to fully leverage taxonomy and metadata in the design of innovative applications like faceted search. 

Moving Toward the Google Experience

For the reasons stated, I believe why can’t we just get Google? is the wrong question to be asking and, in the context of the larger enterprise, I would challenge whether that was what was really wanted. After all is said and done, if Google is still what you think you want, the Google Search Appliance is available for purchase directly from the company. Rather, what I think we’re really looking for is to provide our organizations with a more Google-like Experience. I define this as the ability to leverage search technologies that connect people - directly or serendipitously - with the right content at the right time from the right place in a way that is simplistic, intuitive and fast. 
 
Going from where we are now to where we need to be might seem like an insurmountable task, but if we start to approach it step by step by following the process outlined above, there’s no reason why we shouldn’t be able to get there. I won’t promise that it will be easy but if we take the time to do it, things will get significantly better and trust lost over time will ultimately be earned back. I will promise, however, that if we continue to ignore the problem or think we can solve it through the procurement of a new technology, the situation we find ourselves in now will only get worse as time goes by, and our ability to correct it will become considerably more complex and ultimately more costly. 
 
Rather than posing the question why can’t we, we need to ask ourselves how can we. Once we do, we’ll be taking a significant leap in the right direction. As we move toward a better understanding of both our users and content in the context of our organizations, the delivery of an intuitive and relevant Google-like Experience in the enterprise will be within reach.