Knowledge Management's Rebirth as Knowledge Engineering for Artificial Intelligence

Knowledge management is undergoing a renaissance. 

However, it is not being called knowledge management.  Instead, it is hiding under the covers of artificial intelligence (AI) and cognitive computing.  

An aspect of this evolution is the emergence of "knowledge engineering" as a Thing.  Knowledge engineering is the restructuring of content and data for manipulation and ingestion into these advanced technologies.  The emerging areas of machine learning and machine intelligence require knowledge as an input – properly designed, structured and engineered knowledge input. I've said it before and it bears saying again, "There is no AI without IA."

You are undoubtedly hearing a lot about AI right now.  Unfortunately, some of the statements made about it are preposterous. But that doesn’t mean AI can’t have a role in your organization, both over the near term and over the long term. This article will enable you distinguish the hype about AI from its true capabilities in today’s technology environment. Knowing the difference will help your organization to put AI into practical use right now while preparing for more advanced uses in the future. 

AI Is Not “Load and Go”

The most important input for an AI tool is data—not just any data but the right data.  That means data that is relevant to the problem being solved, specific to a set of use cases and a domain of knowledge.  Many in the technology industry erroneously claim that an AI can just be pointed at data and that the answer will magically appear. The term I have heard used is “load and go” where “all” the data is ingested into the system.  The problem with this approach lies in the vast landscape of explicit and codified enterprise knowledge. AI cannot make sense of data that is too broad or is not processed in a way that makes it digestible for the system. Knowledge engineering is needed build the organizational structures that feed the AI system.

Think about a time when you had to answer a question or understand a problem. You probably did not review every book in the library or ask everyone you know what they thought about it. More likely, you tried to zero in on a more specific match for the task at hand. Perhaps you heard about a book on the topic from a colleague and it seemed like a good place to start, so you ordered it online. You might be a practitioner in a field that has a peer reviewed journal that is recognized as an authoritative source, so you read the most recent articles. Or you visited a website that had insightful blogs on your topic.

In any case, you were able to (and had to) exclude certain categories of knowledge from your organization’s repositories in order to seek out the information you needed. If the issue did not involve a computer program, you ignored resources on software and code. If it did not involve manufacturing techniques, you did not tap into this data. Somehow you were able to filter the data and narrow the search for the answer, ignoring many parts of your organization’s vast resources.

AI Tools Need the Right Data and Content

AI systems work the same way you do. They cannot use raw data effectively, but must processed it by parsing it, contextualizing it, and focusing on a finite number of scenarios. No one system or application can make sense of all of the data out there.  The domain needs to be narrowed and the correct “training content” curated and preprocessed. Knowledge engineering is a more precise description of the approach for refactoring training content since some cognitive applications (particularly those used to power support chat bots or intelligent virtual assistants) use nuanced structures to surface precisely the content required by the user in the context in which they need it.  

This is the same problem in predictive analytics applications. The data has to be prepared, cleansed and normalized to some degree. Even Watson, a high-powered AI system, needed very specific, well-structured information in order to function correctly.  This is contrary to popular beliefs that AI can make sense of content without consideration of structure, format or context.  In some cases it can, but does so only when the correctly designed domain model and knowledge engineering approaches are applied to the framework used for that sense making.  

The domain model is the knowledge engineered scaffolding that AI uses to make sense of content. In other words, if the content itself is not structured and curated, a nuanced, well designed knowledge engineering framework is needed for the AI to function on un-curated content. 

AI Requires an Investment of Time and Money

Watson is gaining wide recognition as a powerful technology, and the Jeopardy win was an amazing achievement.  But that success does not translate into every domain, problem and information source without a large amount of work. The Jeopardy project used carefully selected sources and finely tuned algorithms. It required three years of effort and $25 million.

For a business leader, the idea of taking a cognitive computing application and feeding it a bunch of data and expecting it to produce value is also equally improbable.  There are science experiments and trail blazers out there, but really practical, production ready applications that can sift through cross-domain, cross-process, multiple subject and topic un-curated sources are still some time off.  Where does this leave the industry?

Use Knowledge Engineering to Move Steadily Toward AI

There is a large body of research that points to the right direction: knowledge engineering. That is, developing the content models and packaging content in such a way that it can be processed by different channels of consumption for employees, customer service agents, customer self- service, call centers and automated support bots. “Training bots,” “providing example data,” and “supervised learning” are all examples of AI training that require inputs that are selected for their coverage of a domain of information and for the knowledge that they contain. 

“Training the AI” is a phrase that is frequently invoked to help customers understand the cost and effort that goes into deploying new tools. AI systems need to be trained with content and data. But if the data is messy, it has to be cleaned up first. Otherwise the training data will not lead reliably to the hoped-for outcomes, whether those outcomes are predictions of what a customer might want or detection of fraudulent transactions. The content has to be selected, curated and structured for ingestion. 

Many approaches used to process unstructured data have been re-labeled as AI.  Approaches listed below were referred to as text and content analytics in the past but have been repackaged as AI.  

  • Knowledge extraction
  • Removal of Personally Identifiable Information (PII)
  • Removal of Redundant, Outdated and Trivial (ROT) content
  • Feature extraction from product data

These approaches do use machine learning algorithms, as they have for the past 15 or more years.  They are more sophisticated and applications more capable than in the past, but at the heart of these are principles of knowledge and content management that are even more important to organizations now.  The bottom line is that cognitive computing and AI are advancing but they should be more familiar than mysterious once you take away the jargon and get to what is really happening. 

Established Methods for Information Management Provide the Basis for AI

The approaches describe above, as valuable as they are, will not function without well processed data. At the heart of preparing data for AI are the traditional principles of information management - use cases, scenarios, task analysis, content structures, clean product data, accurate and high quality customer information – integrated with advances in technology.  The approaches should not be mysterious but unfortunately the industry tends to advance new terms as a general practice in order to keep customers buying evolving solutions.

Laks Srinivasan, COO of Opera Systems, agreed with me about the importance of IA when we recently discussed the challenges of big data and AI-driven analytics. “About 80 percent of the work the data scientists are doing is data cleaning, linking, and pre-processing, which is an information architecture task, not a data scientist function,” he said during our discussion.  Opera is an example of a company that developed platform to help data scientists in many aspects of analysis, feature engineering, modeling, data preparation, and algorithm operationalization.

The same issues that must be addressed by IA in order to take advantage of big data and machine learning are the ones that will help your users be more productive using standard knowledge tools as well as those that are advancing with AI.  Semantic search is an example of this. Semantic search uses content architecture plus machine learning ranking algorithms, and integrates other past behaviors that act as signals to inform the user’s immediate needs.  Those signals improve search relevancy and accuracy.  They still require that the information being accessed is meaningful and appropriate to the needs of the user.  The better structured and curated the information, the more likely the results will be high quality.

KM’s Grand Makeover

Cognitive computing has been called by Forrester Research “Knowledge Management’s grand makeover.”  There are some misconceptions about what this term means.  It does not mean we no longer have to care about tagging and organizing content and that the tools will do that automatically.  The tools help to interpret the user’s intent – the objective at that moment – and they help to add context to their intent – such as social media data, past purchases or on site click stream behavior. 

Just as a thesaurus helps to map a query using a non-preferred term to a term more commonly used in content or a tag, machine learning can classify a natural language query into a common meaning that is then passed back to the search engine.  The more we know about users and the things they need, the more we can tune those intents and queries to return the right information.  Developing these underpinnings can help your organization leverage existing knowledge as new and more sophisticated AI tools emerge.

KM Today = AI Capability Tomorrow

In fact, organizations that don’t solve today’s KM and findability challenges will be at a great disadvantage in the very near future.  None of the tools in the future will obviate the need to create domains of content and data for analysis.

Think about it: Right now, $15 billion in investment is chasing a $1 billion dollar AI market. The interest is strong and the market is growing. Your organization will be getting a lot of phone calls and sales pitches, so make sure you understand and can explain the core principles of knowledge engineering to your organization so business leaders don’t go down the path of spending money on unrealistic promises without a solid foundation in core knowledge management principles. 

Need help with your own transformation program? Lay the foundation for your organization’s success with our Digital Transformation Roadmap. With this whitepaper, assess and identify the gaps within your company, then define the actions and resources you need to fill those gaps.

 

Seth Earley

Seth Earley is the Founder & CEO of Earley Information Science and the author of the award winning book The AI-Powered Enterprise: Harness the Power of Ontologies to Make Your Business Smarter, Faster, and More Profitable. An expert with 20+ years experience in Knowledge Strategy, Data and Information Architecture, Search-based Applications and Information Findability solutions. He has worked with a diverse roster of Fortune 1000 companies helping them to achieve higher levels of operating performance.