In this episode of “Earley On…”, Seth Earley, CEO of Earley Information Science discusses the hard problem of Enterprise Search and its various intricacies. Seth addresses the issues of content, metadata, governance, and the use of manual and automated methods & processes to guarantee high quality search performance in your enterprise.
Seth Earley: Welcome to the Earley Podcast. I was originally trying to do this on a weekly basis, but life intervened and I was not able to. So, I'm going to call this my regular, irregular podcast. My occasional podcast. And this time I'm going to talk a little bit about internal search. Search has been a huge topic lately, everybody has to deal with search and information access, at the end of the day what else are we doing besides searching and retrieving information. We had a session on this a couple of weeks ago and we talked about search really changing -- it's not a white box. It’s not something that you just bolt on. It's not about technology per se, it's about the experience. It's not information aggregation, its access. It's about building capabilities. And search algorithms maybe improving, but they still don't know what your intent is. They don't know what your role is, they don't know what your perspective is. Then, of course, we're trying to infer perspective, we're trying to infer intent whenever we get these ...signals. And at the end of the day search is a set of signals. We're taking keywords and we're putting that into a system and those are signals. And then we're trying to interpret those signals and return something. And really at its foundation, it's a recommendation engine. Search says, "Oh, you're giving me this information. I'm going to recommend these documents. I'm going to recommend these web pages. I'm going to recommend these products or these actions or whatever it might be." And we really need to start integrating various design methods into search and take it where there is an application. So we need to think about the user, we need to think about the task, the processes, we absolutely need to look at the content.
We need to consider scenarios; we have to think about what our users are going to do? You know, I always say to people, "they say, oh we want — we wanted to be like Google." And I say if you put as much time, energy, resources and money into optimizing your content, the organizations that, they could search ranking results do your search will be like Google. Because they're spending a lot of time optimizing content. So when you think about search it's about find-ability. Or as some people say search is easy, finding is hard.
There's a bunch of different things that come together. You certainly have to have technology that's going to support your objective. You need to have good organizing principles, you have to have information architecture. Some people say, "Oh, you don't need that, these new algorithms will do that." Well they're making decisions and assumptions about architecture. If you're not doing it explicitly, they're doing it implicitly, and they're doing it as part of their search engine algorithm. So information architecture is very important and process is important, governance is important. You know how people execute their searches -- and again we're looking at this from an internal enterprise perspective, I'm going to talk about a site search as well -- and so when you think about find-ability, when you think about search and information retrieval, it's a combination of things. It's not a single thing and we have to have search engines that are integrated with our technologies in our data sources, we have to have some information architecture and user experience that's designed, we need to have metadata.
You know, again search is about metadata and if we are not applying metadata, we are deriving it and that's what a search engine does. It implies, it says "oh, I think I know what this is about and I'm going to index these documents indexing as implied metadata." We need to look at our content processes, part of the issue is, if we just throw everything in a giant pile, yeah, it's going to be hard to find stuff and if users don't necessarily curate their content at all or they don't try to apply any organizing principles then that's a problem and there's a lot of different things that need to happen on the content side in order for that to work. What ties it all together is governance and again we're not just throwing a bunch of stuff in a pile and trying to find it. When we look at product information, people spend a lot of time curating product information. We look at high value content for helpdesks, people spend a lot of time curating and organizing that information. And you can think of this as a maturity model, maturity curve.
You can think of your organization as being proficient in any of these areas and we actually have a maturity model that we use that talks about these different parameters, these different facets and then rates the organization in each of these areas and everything from unpredictable to where the competent, is synchronized, is choreographed, you use whatever terms you want and at each stage for each of these facets, there are certain characteristics to these. And of course you know search Nevada is where you have automated workflows that help you report on compliance, that automatically flag things that are not implemented. You have search integration with ontologies, very rich ontology so you can build associate relationships that will provide the related content. Think of an ontology and associated relationships as a reference librarian who when you ask them a question, they know where to look and they'll make recommendations because search terms are very sparse and you can build these relationships that actually embed knowledge of the organization and knowledge of the processes, knowledge of tasks, and knowledge of your IP into the structure and then that structure can be referenced by the search engine and that's all driven by use cases.
We need to think about integration of structured and unstructured content and pull this from multiple systems dynamically; it really is about the what and the why. You know what happened, why did it happen and we need to contextualize and personalize metadata in such a way that we're auto populating as much of it as we can like we're taking the onus off of the user and we're putting this into the system we're building into the processes. And governance should be looking at all of this on an ongoing basis, so that you have engagement, you have the right resources, you can continuously improve the processes and when you look at how you're trying to build this information architecture that supporting a search, there can be many steps in the process and again people say, I don't have time for that or that's too expensive or that's too much, be we have to look at high value content and if you have high value content and high value process, then it justifies this. And there's a top down approach to developing information architecture and bottom up, what do I mean by that. Well top down is kind of looking at what are the problems that we have and what are the issues and who are the audiences, and what are the use cases and what content do they need to support that process, now let's organize the content. So, you're really looking from a business agenda and a user agenda and that's always important. Bottom up says well let's look at this content. Let's see how well we can organize it. If I send somebody down to your basement, to organize basement, they'll put it in some semblance of order, same idea. You can just look at content and say what's the nature of this content? and how can I organize it and put it into buckets that makes sense. I don't necessarily need to know the user, but I can do it now of course, I absolutely should know the user and I should know the use cases because I can't — even if I say to you in your basement, "hey this is stuff important, do you need these old record albums, do you need these photographs, do you need this, do you need that? Are these books important, what's important to you, do you need these golf clubs right." So even though I could organize it, I can't say what's important and I can't prioritize, I can't make it easily accessible if I leave your stuff, buried in the back in the attic or wherever and that's really important stuff I need that today.
Well that's a different story. Do you camp anymore? I have camping equipment. So, content is the same way what's important and how do you prioritize and how do people access it. So, you can look at the content you can start writing, organizing principles and taxonomies and metadata and you can build content models and you can align those with mental models and you know ultimately the bottom up and the top down integrate and we build this user experience. So, again, search is almost the end of the story right. It's saying, I've already done all this stuff to make my content manageable and effective and organized and now I want to put an interface on top of it in order to retrieve it. And even when you start doing things like content or to start realizing that you have a lot of junk and you got to throw away stuff and you can use technologies to do this. You can facilitate audits with the systems that will look at various characteristics and say this is high priority or low priority, you can give it models, you can use example content, you can build a training corpus and then apply that against your larger content and you can't do any of this stuff completely manually. But you know I talk about use cases and one of things I want to say is that you really do need to develop use cases and when I start talking about the level of specificity that I mean when I say use case, a lot of people say oh,my God I have to do that for everything?" Well yeah, I mean eventually you should be building libraries of use cases for your organization and those should be part of the IP, those should be part of the know-how, those should be part of the other artifacts that you're building that help you understand what's important to people. You build them anyway when you have job descriptions and tasks and processes procedures. Those are all use cases and so the use case doesn't find this information or must be able to use a web browser, you know that's ridiculous stuff, but it's really starting to think about the actor, the action, the objective, the content that would be needed and then how we can start organizing that, what are the handles on which we access their content and so I think at the end of the day you really need to think about search as an application.
And we're trying to surface content in the context of tasks and what again does that mean, it means we have to understand our audiences, we have to understand things are trying to accomplish, we have to look at the content, we have to develop those organizing principles, we have to look at how this integrates across systems and build workflow processes and content author processes and curation processes, review and publishing and so on and we have to be able to do this in such a way that we surface information for users rather than have them, try to put a terminal white box and then pull back a whole bunch of facets.
Now another way to think about this because again it's work and it's expensive and it's difficult. Is that there's a span of structure, there's a span of value, there's a span of context for content, when you start with structure. On the one end, you have knowledge creation. Knowledge creation is a chaotic, sort of processes. There's less structure inherently and you're just collaborating with people and you have things like you are collaborating work spaces or instant messaging or blogs or wikis or e-mail management and you're simply trying to solve a problem, you're trying to collaborate your colleagues, you are creating knowledge and then the other end of the spectrum you have knowledge reuse, you have knowledge access, you're trying to answer questions, you're trying to retrieve information for a very specific purpose, you're trying to access structured documents and structured content maybe learning management systems or in their records management system, and process management, and you know content management and again this is a continuum, right so you have knowledge creation and knowledge reuse and when people say "I can't find my stuff," we have all these collaborative work spaces everywhere and I say "well what you are trying to do? " "Well I'm trying to find this information that I need for this process, alternate collaborate work space." Yeah! Well it doesn't belong there, it belongs in a repository where you can retrieve it, have to promote that content, the analogy I get peoples. Imagine that you're in a conference room with a bunch of people and you're working on a plan or project and somebody barges and says, "hey where's our strategic plan for marketing for this quarter," "you are going to be, get out of here, we're working, don't start rifling through my stuff." but at the end of the process when I solve my problem and I have all my papers out and flip charts up and posted notes, at the end of that there's an output. I take that output and I put it on the shelf outside for people to access.
Same idea, collaboration is a chaotic process. Its knowledge creation, you're producing an output. Once you produce that output, it needs to be organized; it needs to be put in a place where you can retrieve it. The other aspect of content is to think about the continuum of value right. We have low value content, low cost, it's not easily accessible, it's unfiltered right, so message text or discussion postings or draft deliverables or external information that comes in and then you can start to think about moving it through this process where you start adding structured to, we start tagging it and organizing and editing it and vetting it and proving that becomes that expensive because when we put that energy in, it becomes higher cost content. But it's more accessible and again best practices or proved methodologies or you know process information or products to support a particular task. All of that becomes much more valuable and requires an intentional approach to the content in the structure and the process and the onboarding and the curation and the vetting and the editing and so on.
Now some people say, “Oh, there's all kinds of high value content in my e-mail or in message text or collaborations. Yes, that's fine, but it has to be pulled out and has to be segmented and it has to be prioritized, it has to be vetted, it has to be edited, again so the idea is to think about low value and high value content and think about the support of the process and the amount of energy that has to go in to taking the value content and creating something of higher value. We also have a task context because we can have a very narrow audience and you can kind of think of this, as your desk and your desktop, it's very messy perhaps and it's a narrow audience you versus a broader audience and think about walking into near the atrium or the corporate reception center. You're not going to be putting your stuff up there, you're not going to be putting a poster up, with your to-do list in the visitor center or in reception. It will be stuff up there, but that's for wide dissemination, that's approved, it's a clear message, it's for everybody and it's for the broader audience or anybody coming in. So you know when you look at application specific localized, you're talking about solving particular problems, you're talking about technical assets, you're talking about focused use maybe it's a service line or functional scope, it's in the weeds right. Its developing capabilities, it's solving problems, you know subject matter experts are curating it. Other end of the spectrum, talking about general use, organizational scope, higher level of messaging, common processes, it's a generalized applications, things like processed argumentation, executive level vision, firm wide messaging, communication policies and again you're going to local versus global. Those will never require different processes and different standards and different policies and different organizing principles.
So, people look at content and they look at search and they say, "Oh, my search sucks," and they don't think about all of these different aspects of your content, application context, information construct, task construct and so on. In an application context, it's less structure versus more structure. For the information construct, it's unfolded versus filtered. For the task construct, its engagement level and focused level versus policy level. And again there's a whole bunch of different parameters, nature of the process, the knowledge process, the purpose, the span of control, different classes of tool support, these different types of constructs. The cost, the value, the editing process, the vetting process and tagging process, the type of content, the ease of use and then again audience application, level of detail and so on. So when we look at this world of information and information access, information and data is very heterogeneous right. We have everything from internets and web pages and documents, sitting on file servers and file sharers and we have custom databases and we have messaging applications, we have customer relationship management systems, product life cycle, earpieces, business intelligence with this huge range of different types of things and then we have different ways of structuring that information and different ways of managing it, organizing it, retrieving and integrating and you can’t say it's one size fits all, it's really thinking about it from that application perspective.
I say search is a recommendation engine, because you can say I'm going to pull something back based on some simple signals, some simple attributes, so if I'm looking for a restaurant on yelp, I might look in a particular geographic range, I might look for price range, I might look for type of cuisine, it's few variables, it's unambiguous, it's subjective, it's less complex, it’s easier to model and they really matching algorithms or query and cross data sources. I can start to get more complex by adding more variables and having ambiguity and having situations that are subjective that require knowledge of the domains and experiences and the behaviors and then I can start getting into more subjective attributes based on the judgment of the person who's modeling this or if there's more ambiguity, it's harder to validate and it gets more complex based on probabilities and learning algorithms and latent attribute models. We have lots of variables we have patterns that emerge. It depends on probabilities and you can surface these patterns in different ways. From the one end, the spectrum simple attribute based retrieval, on the other their latent attribute models and that's where we get our learning algorithms that look for patterns and surface those patterns, but at the end of the day, it's a recommendation engine, searches and recommendation engine and there are lots of different ways to slice and dice this and make this work.
When we are going to take up the topic of product search and site search on a future podcast and please be sure to check out our Executive Roundtable.
Again my name is Seth Earley and I thank you for your time.