We have been seeing significant progress in the field of machine learning – programs that “learn” as they process information in order to produce a new result. This concept sounds simple, but has a huge range of ramifications when it comes to applications.
Machines “learn” by making an approximation, comparing the output to a target or objective, and then using that comparison as input to the next approximation. This principle is the foundation for hundreds of algorithms that produce everything from spelling corrections to self-driving cars. The algorithms learn from experience by taking in information about the problem and using that information to make a prediction about how to solve a new problem. Looking for patterns in data tells the system something about the nature of that data. In that analysis, the software identifies “features” – the characteristics that are relevant to the problem at hand – and determines which ones are most important.
Features and attributes about “catness” and “dogness”
A feature can be considered an attribute, a descriptor or characteristic of an entity. Imagine that a machine learning software program was looking for pictures of cats. On a simplistic level, pointy ears are one feature of “catness.” The software looks for things with pointy ears and puts them in a bucket called “cats.” But that basic classification is not nuanced enough to determine for sure whether something is a cat or not. By looking at lots of pictures of cats, the system learns the subtleties of catness versus dogness (since some dogs have pointy ears). Hundreds or thousands of such signals may be required in order to teach the system how to identify cats in different situations, whether they are sitting, lying down, being held, in a picture with dogs or rabbits, fat or thin, etc.
Learning solves information problems in a number of ways. Consider the following scenarios:
Given a set of data, look for anomalies
The question here is, what is an anomaly? That depends on the nature of the content and the task at hand. If I am looking at credit applications, one “anomaly” that I want to detect could be bad credit risks. Given a sample of applications from people who are good risks, and a sample of those from people who have proven to be poor risks, the system can identify characteristics that may not be obvious. For example, some credit issuers for mobile phone platforms have correlated the number of times a cell phone dies from insufficient charging with creditworthiness. Another anomaly could be a fraudulent application. The number of items that are changed as a user fills in the information in an application has been identified as a “signal” of fraud.
If I am looking at a patient population, the anomalies I am looking for may be those individuals at risk for readmission, or those who are at high risk for non-compliance with treatment. A bank may look for patterns that lead to a customer leaving for a competitor. The system says “one of these is not like the other.” This difference stands out for some reason. The reasons they stand out can be very subtle, and a combination of very small variations across many features, rather than a single feature, may add up to the anomaly.
Given a set of data, look for the same patterns
This issue can be considered from the perspective of pattern identification. “Look for more like this” requires a set of data that contains enough of the pattern that the system can learn all of the features that comprise that pattern. The system then looks at a new set of information and picks out the things that match. An example might be to look for another document similar to one that I have. The threshold for similarity could be identical documents for de-duplication, or conceptually the same for research purposes. Or given a photo of a friend, the issue could be to locate all of the others of that person in a folder of hundreds or thousands of photos.
Given a set of data, put the data into buckets
Perhaps I want to classify support content for products according to the type of problem that needs to be solved—for example, how to troubleshoot a printer jam. I can do so by getting representative content for each class of problem (training the algorithm) and then running it against a larger body of content. Documents will be classified into each of the buckets based on representative content. Or I may want to group products according to a combination of attributes that they share, so I can organize them on an ecommerce website.
It’s all about categorization
As I look at this list, it becomes evident that there is really one mechanism at work here – that of categorization. In each case categories are being defined based on certain characteristics, attributes, patterns, signals, or features.
An Intelligent Virtual Assistant (IVA) is a good example of an intelligent system that uses categorization. Many mechanisms are at work in an IVA, but fundamentally it is classifying what a user says according to what the user wants (their “intent”). Understanding that intent can be achieved by leveraging a number of mechanisms, from machine learning to natural language understanding. That intent is then aligned with a response from the system.
In some ways, categorization is a retrieval approach. In fact, one way of dealing with user “utterances” (the things they say) is by searching on the concepts represented in the utterances. That search can use additional clues about what the user wants based on where they are in the process; e.g., what stage of the customer journey they are in. All of the variations in how the user asks a question can be classified according to a set of rules or through enough examples to arrive at an understanding of what the user needs.
In semantic search, a thesaurus structure can be used to map user-specific terminology to terms with which the content was previously tagged, or that is contained in the body of the content. For example, “statement of work” and “proposal” may be considered to be equivalent and therefore are identified as such in the thesaurus. Then, even if the user is searching for “statement of work” and the content is tagged with “proposal,” the thesaurus can map those terms together to improve recall. It is also possible to deconstruct the utterance into components of grammar and syntax to understand the meaning. Meaning is mapped to intent, and intent is used to retrieve needed content from a knowledge repository – a knowledge base.
Leveraging legacy knowledge
In the knowledge base, responses are also classified according to intent in order to match the user needs as well as to other signals to provide further context – the person’s role, the step of the process, perhaps products or equipment they own, and/or their location if appropriate and relevant. Those responses are designed to help the user with the task at hand and therefore are an integral part of a support knowledge ecosystem.
Well-structured knowledge bases are very important to the development of bots; however, the structure of the content from these systems needs to be refactored into smaller chunks and tagged for context before ingestion into the bot framework. This content might be derived from training material that instructs users, or from troubleshooting guides, FAQ’s and other rich knowledge content. Procedures can be tagged with the appropriate intent and process, or given enough examples, can be classified using machine learning. This is how responses from the system get mapped against user questions.
Chat logs for training
Additional content for training the system can come from the text logs of interactions with support agents. The interactions between customers and support reps clarify the relationship between questions and answers, which the algorithm uses to improve the accuracy of scripted dialog and to deal with “edge” or ambiguous cases.
The key takeaway is that computers can be trained to answer questions with greater and greater accuracy and specificity, with a natural tone, and even with some expressions of personality such as humor. These answers are based on content that is accessed through a range of mechanisms. The important piece is to have the knowledge codified in structured content in the first place. Making that content more accessible for people will help when it needs to be ingested by and repurposed by machine intelligence driven systems.
AI as a toddler
AI is, at its heart, a classification mechanism and makes sense of the world by putting information into buckets, and then acting on that information according to the bucket it puts it into. Consider how humans learn – they begin by learning to classify things. That is the first thing that they do when they begin to acquire language (watch a toddler begin to speak – they point to things and name them) and is the mechanism used throughout a life of learning. Machines will increasingly be doing the same.