How to Successfully Test and Deploy a ChatGPT-Type of Application

Organizations are taking a cautious approach to Generative AI — the Large Language Model (LLM) powered ChatGPT-like applications that have burst onto the technology and consumer scene. Increasingly, the C-suite is trying to factor in how LLMs and Generative AI will be part of their digital transformation roadmaps.

The risks of diving into this technology are significant and include:

  • Unrealistic expectations of LLMs as a magic solution to managing corporate content without requisite human involvement
  • Generating responses misaligned with company policies or brand image
  • The knowledge is cut off when LLMs are not trained with current knowledge or knowledge of the organization, and therefore cannot produce answers.
  • Difficulty distinguishing between creative outputs and fabricated responses (hallucinations)
  • Absence of clear audit trails and citation sources
  • Decisions around training models: balancing usefulness with the threat of exposing trade secrets or other proprietary knowledge
  • Potential financial burden of using proprietary LLMs and related enterprise software platforms

However, the rewards of a successful implementation are also significant. The technology can increase productivity, save time on routine tasks, improve human creativity, act as a sounding board or starting point for research, and help to synthesize and summarize large amounts of information.

LLM-based applications can assist organizations in making sense of large numbers of documents and content. Employees or customers can then spend less time searching for what they need. In some cases, the applications can anticipate a need, and surface content before the user even asks for it. This ability is the function of personalization, and in that context, systems powered by LLMs can improve the customer experience and increase engagement. 

How do organizations overcome the problems that are inherent in LLMs?

Aa few rather straightforward practices can be used to address multiple issues related to LLMs.

First, it is important to understand the limitations of LLMs. LLMs are not solutions in themselves. They are part of a tool kit. Human intervention is needed at certain key points. In addition, the system needs to understand things that may not be in the public domain or policies that are specific to the organization. That information has to be structured and curated. Humans need to capture and codify knowledge so it can be used in AI systems, and they are also needed to solve new problems that an LLM cannot yet solve. LLMs don’t automatically know your company’s language, terminology, acronyms, or processes.

There is always information that needs to be protected from disclosure, including competitive strategies and knowledge of customer needs, and details on precisely how the organization serves those needs. That knowledge is the foundation for competitive differentiation, and the LLM needs access to it to provide answers. At the same time, that information cannot be safely uploaded to a publicly available or commercial LLM as it becomes part of the training data.

The solution is to use a localized or private cloud copy of an LLM that is used to access organizational knowledge while keeping that knowledge confidential. This is also the way to eliminate “hallucinations” or creative output that may not be aligned with the brand. The LLM tools have a parameter called “temperature” that allows for increasing creativity — including completely fabricated answers. By turning the temperature down to 0 and specifying that answers should only come from ingested knowledge, as well as instructing the system to respond with “I don’t know,” if the answer is not in the data source, hallucinations can be eliminated. This is called “retrieval augmented generation” or RAG.

If we are not using the LLM for the answer, then what role is it playing?

Instead of using it to answer the query directly, LLM is used instead to “process” a query and use that processed query to retrieve information from a knowledge source or database. The results are then also processed to make them sound more conversational. The preprocessing part can be considered a “thesaurus on steroids” by making variations in requests conceptually the same. Just as chat utterances must be processed so that questions or comments people ask that vary in phrasing but mean the same thing can be converted into a standard intent that (which the system can respond to, LLMs need to do this at the conceptual level. What is the user trying to accomplish? How is that question the same as other types of questions that other users are asking? 

The following is a vast oversimplification of the actual process but will explain the main concepts.

The system represents a query in mathematical terms called a vector. Vectors are multi-dimensional, and the characteristics of a document or piece of content are represented by those dimensions. A body of content can contain thousands or tens of thousands of characteristics. The LLM compares that mathematical representation to the mathematical representations of knowledge ingested from the knowledge base into a vector database. The closest vector is then returned as an answer. The LLM processes the answer using its knowledge of language to make the answer sound more conversational.

The value of metadata

Many of the vendors offering LLM-based solutions are recognizing the need for knowledge as an essential component; however, some are missing an important detail — how to structure the data. I recently spoke with a Generative AI vendor and asked him about the role of taxonomies, metadata, and knowledge graphs. His response was, “You don’t need any of that.” When I pressed him about how the data was prepared, he admitted “Well, we have to do some data labeling.” That said it all: labeling data is the same thing as applying metadata.

Metadata is more important than many people recognize. In a recent research project, we found that an LLM was able to answer questions 53% of the time without metadata, but 83% of the time with metadata. That is a vast performance improvement.

Guidelines to successfully test and deploy a ChatGPT-type of application

Use cases

Choose a narrow set of use cases to begin with — the narrower and more clearly defined, the better. Use cases need to have an unambiguous outcome to be testable. This aspect is critical since it will determine whether the application has been successful. An ambiguous use case would be, “Use an LLM to support customers.” In contrast, a clear, testable, and unambiguous use case would be, “Use an LLM to troubleshoot a modem installation using the installation manual as a reference,” or “Use the LLM to determine milestones in a project based on a project document.”

Identify needed content

In the case of the modem installation, the installation guide is necessary and must contain the steps needed to troubleshoot. In the case of project milestones, the project documents or database must contain those milestones. Otherwise, the system simply does not have the information necessary to provide an answer.

Tune the LLM to reduce creative outputs

Setting the temperature to 0 and telling the model to only use information in the database and to answer, “I don’t know,” if it does not have the answer from the data source will eliminate hallucinations.

Test on a library of use cases

Continually test the LLM against a library of use cases that can be a reference source for benchmarking and measure ongoing improvements.

Gather metrics

Identify the use cases or queries to which the LLM responded with “I don’t know,” and identify knowledge gaps for remediation. Test on those use cases when onboarding new content and data sources to see whether the new information addresses the gaps.

Use a knowledge/information architecture to enrich data:

Metadata applied to content will improve the performance of an LLM. The identification of departments, processes, content types, topics, and other information characteristics will provide additional cues for the LLM to increase its ability to answer questions. Product metadata is critical to conversational commerce where customers will ask questions about your product catalog rather than searching or browsing. Metadata also supports non-AI applications and makes it easier to upgrade, integrate and harmonize various other technologies in your information ecosystem.

Integration into workflows

Chat may be an ideal interface for certain uses, such as customer sales, but the LLM-based tool can also be integrated with other systems and technologies at the API level. Think of your knowledge repository as being able to power marketing workflows, email messaging, customer self-service, customer support, field service, and even embedding knowledge and “how-to” information in products themselves.

End user acceptance

Users will only trust what they understand. Providing the traceability of answers from an LLM using the knowledge base approach and retrieval augmented generation will reassure executives, internal users, and customers that the information is correct, accurate, and up to date.

Content operations and governance

The use of LLMs calls out the need for mature knowledge/content operations. Organizations will also need a mechanism for allocating resources and measuring results, as well as making course corrections. Governance encompasses these important processes, and while it is not as glamorous as the Generative AI applications, it is foundational to their success.

The use of LLMs and Generative AI is still in its infancy. Organizations that deal with their content, knowledge, and data systems now will be ahead of the game. This is a fast-changing technology, but organizational knowledge will always be a key enabler. Knowledge has been a neglected element of digital transformation; however, Generative AI will put it front and center.


This Article was originally published in Customer Think.

Meet the Author
Seth Earley

Seth Earley is the Founder & CEO of Earley Information Science and the author of the award winning book The AI-Powered Enterprise: Harness the Power of Ontologies to Make Your Business Smarter, Faster, and More Profitable. An expert with 20+ years experience in Knowledge Strategy, Data and Information Architecture, Search-based Applications and Information Findability solutions. He has worked with a diverse roster of Fortune 1000 companies helping them to achieve higher levels of operating performance.