All Posts

    This article originally appeared in KMWorld.

    Organizations are taking a cautious approach to Generative AI – the Large Language Model (LLM) powered ChatGPT-like applications that have burst onto the technology and consumer scene. Increasingly, the C-suite is trying to factor in how LLMs and Generative AI will be part of their digital transformation roadmaps. The risks of diving into this technology are significant, and include:

    • Unrealistic expectations of LLMs as a magic solution to managing corporate content without requisite human involvement
    • Generating responses misaligned with company policies or brand image
    • The lack of knowledge since the end of the LLM's machine learning phase or the lack of organizational information
    • Difficulty distinguishing between creative outputs and fabricated responses (hallucinations)
    • Absence of clear audit trails and citation sources
    • The threat of exposing trade secrets or other proprietary knowledge
    • The potential financial burden of using proprietary LLMs

    However, the rewards of a successful implementation are significant. Generative AI can increase productivity, save time on routine tasks, improve human creativity, act as a sounding board or starting point for research, and help access, synthesize, and summarize large amounts of information.

    How Organizations Overcome Problems Inherent in LLMs

    First, it is essential to understand the limitations of LLMs. LLMs are not solutions in themselves. Human intervention is needed at critical points. Systems need to understand things that are specific to the organization. That information has to be structured and curated. Organization-specific knowledge needs to be accessed by AI systems. LLMs don't automatically know your company's language, terminology, acronyms, or processes. 

    At the same time, sensitive information cannot be safely uploaded to a publicly available or commercial LLM as it becomes part of training data. By using the APIs of commercial LLMs, the question and the answer do not become part of the model training. Since the organization is querying a knowledge store behind the firewall, the information is secure. By specifying that answers should only come from ingested knowledge, as well as instructing the system to respond with "I don't know," if the answer is not in the data source, hallucinations can be eliminated. This is called "Retrieval Augmented Generation" or RAG.

    Why Use an LLM If It Doesn't Know the Answer?  

    Instead of using it to answer the query directly, an LLM is used to process a query and use that processed query to retrieve information from a knowledge source or database. The results are also processed to make them sound more conversational. 

    The Value of Metadata

    Metadata is more important than many people recognize. In a recent research project, we found that an LLM could answer questions 53% of the time without metadata but 83% of the time with metadata. That is a vast improvement in performance. Metadata provides valuable context that may not be available in the text itself. 

    Guidelines for Successfully Deploying LLMs

    Use cases

    Choose a narrow set of use cases to begin with – the more limited and more clearly defined, the better. Use cases need an unambiguous outcome to be testable and show success. A clear, testable, and unambiguous use case would be, "Use an LLM to troubleshoot a modem installation using the installation manual as a reference" or "Use the LLM to determine milestones in a project based on a project document." These form the foundation benchmark to test approaches.

    Identify needed content

    In the case of the modem installation, the installation guide is necessary and must contain the steps needed to troubleshoot. In the case of project milestones, the project documents or database must contain those milestones. Otherwise, the system does not have the information necessary to answer. 

    Tune the LLM to reduce creative outputs

    Setting a parameter called "temperature" to 0 will reduce creative responses. The parameter instructs the model to use only information in the database. If it encounters a question it can't answer from the data source, it should respond with, "I don't know." That will eliminate hallucinations.

    Gather metrics

    Identify the use cases or queries to which the LLM responded with "I don't know" and identify knowledge gaps for remediation. Test on those use cases when onboarding new content and data sources to see whether the new information addresses the gaps. 

    Use a knowledge/information architecture to enrich data

    Metadata applied to content will improve the performance of an LLM. Identifying departments, processes, content types, topics, and other information characteristics will provide additional cues for the LLM to increase its ability to answer questions. 

    End-user acceptance

    Users will only trust what they understand. Providing the traceability of answers from an LLM using the knowledge base approach and retrieval augmented generation will reassure executives, internal users, and customers that the information is correct, accurate, and current. 

    Content operations and governance

    Using LLMs calls out the need for knowledge/content operations. Organizations will also need a mechanism for allocating resources and measuring results, as well as making course corrections. Governance encompasses these important processes. While not as glamorous as the Generative AI applications, governance is foundational to their success.

    Summary

    It is essential to assess knowledge against these criteria. Organizations that deal with their content, knowledge, and data systems now will be ahead of the game. Organizational knowledge and data is a crucial enabler.  

    Seth Earley
    Seth Earley
    Seth Earley is the Founder & CEO of Earley Information Science and the author of the award winning book The AI-Powered Enterprise: Harness the Power of Ontologies to Make Your Business Smarter, Faster, and More Profitable. An expert with 20+ years experience in Knowledge Strategy, Data and Information Architecture, Search-based Applications and Information Findability solutions. He has worked with a diverse roster of Fortune 1000 companies helping them to achieve higher levels of operating performance.

    Recent Posts

    [Earley AI Podcast] Episode 41: Ian Hook

    Ian Hook on Advancing Operational Excellence with AI and Knowledge Management - The Earley AI Podcast with Seth Earley - Episode #041 Guest: Ian Hook

    [Earley AI Podcast] Episode 40: Marc Pickren

    Search Optimization, Competitive Advantage, and Balancing Privacy in an AI-Powered Future - Marc Pickren - The Earley AI Podcast with Seth Earley - Episode #040 Guest: Marc Pickren

    [RECORDED] Product Data Mastery - Reducing Returns to Increase Margin Through Better Product Data

    Improving product data quality will inevitably increase your sales. However, there are other benefits (beyond improved revenue) from investing in product data to sustain your margins while lowering costs. One poorly understood benefit of having complete, accurate, consistent product data is the reduction in costs of product returns. Managing logistics and resources needed to process returns, as well as the reduction in margins based on the costs of re-packaging or disposing of returned products, are getting more attention and analysis than in previous years. This is a B2C and a B2B issue, and keeping more of your already-sold product in your customer’s hands will lower costs and increase margins at a fraction of the cost of building new market share. This webinar will discuss how EIS can assist in all aspects of product data including increasing revenue and reducing the costs of returns. We will discuss how to frame the data problems and solutions tied to product returns, and ways to implement scalable and durable changes to improve margins and increase revenue.