AI’s Value for Product Data Programs

By Dan O'Connor, Director of Product Data, Earley Information Science

AI is all the talk of every business trying to find new ways to engage their customers. Forrester states on their AI landing page “Artificial intelligence (AI) has the potential to fundamentally remake the nature of firms, employment, and how work gets done.” Head to a business conference today and one of the first topics on the agenda is always AI. Between Large Language Models (LLMs), Generative AI, and RAG (Retrieval-Augmented Generation), AI is discussed with businesses today in the realm of unfulfilled expectations in the standard hype curve.

With personalization and product data syndication requiring large amounts of attribution and content, the pressure to deliver high-quality feature bullets, unique descriptions by channel, validated images of the right quality and dimensions, and deep attribution about products has never been greater. Creating that content can be time-consuming and expensive and can delay the launch of products to market. Many companies are now looking to AI to make content creation scalable, but the understanding of how Large Language Models (LLMs) and AI can help is often a grey area.

The answer to AI’s positioning within content creation in a business is complex, but with a few definitions and some interesting ideas we can make that answer easier to comprehend.

What are Generative AI and Retrieval-Augmented Generation (RAG)?

Generative AI, short for Generative Artificial Intelligence, is a subset of artificial intelligence that focuses on creating data or content, such as text, images, audio, or other forms of information, which is not explicitly provided to the system. In other words, generative AI is designed to produce new data based on patterns, information, and examples it has learned from existing data.

Generative AI relies on neural networks and deep learning techniques to generate content that is coherent, contextually relevant, and often indistinguishable from content created by humans. Some popular examples of generative AI models include GPT-3 (Generative Pre-trained Transformer 3) and its successors, which are designed for natural language generation, as well as image generation models like DALL-E and text-to-image synthesis models.

Generative AI models are trained on vast datasets, allowing them to understand and replicate patterns in the data, making them powerful tools for automating content creation, personalizing user experiences, and aiding in creative tasks. However, they also come with challenges, such as the need for careful curation of training data, ethical considerations, and ensuring that generated content meets quality and safety standards.

Retrieval-augmented generation (or RAG) is an advanced technique in natural language processing and artificial intelligence that combines elements of text generation and retrieval to produce more contextually relevant and coherent content. This technique is particularly useful for tasks that require generating human-like text with reference to specific information or context.

Here's how Retrieval-Augmented Generation typically works:

  • Retrieval:  Initially, a retrieval step is performed to extract relevant information or context from a given dataset or knowledge base.  This information can include facts, descriptions, or examples that are pertinent to the content generation task.
  • Generation: After retrieving the relevant information, a text generation model, often a large language model like GPT-3 or GPT-4, uses this retrieved content as a reference or context for generating text.  This generated text can be in the form of sentences, paragraphs, or even entire documents. 

The key advantage of Retrieval-Augmented Generation is that it allows the generated content to be more aligned with the retrieved information, resulting in content that is contextually accurate and coherent. This technique is applied in various natural language processing applications, including content generation, question answering, summarization, and more.

Global LLMs Versus Fine-Tuned Models

One of the key concerns with LLMs is data security. While global LLMs do not store private information in a way that can be accessed by other users, putting sensitive or private information into a global LLM is discouraged. Fine-tuned models allow the breadth of knowledge of a global model but localizing that model to the domain in which you are working. It is much more cost-effective than training a model from scratch.

It also avoids “hallucinations”, which occur when an LLM cannot find an answer and “makes stuff up”.  If you haven’t read the story of the lawyer who used ChatGPT to write his filings, there are important lessons in avoiding hallucinations. Fine-tuned models that are domain-specific help avoid hallucinations while directing the model toward understanding your business.

What is AI’s Role in Product Data Generation?

AI can play an important role within your product data program, but it may be a little different than you think.

  • Product Descriptions: Writing effective product titles, descriptions, and specifically channel-unique content is the most obvious use case for generative AI. As every product requires at least one well-written, SEO keyword-rich, engaging title and description, letting generative AI handle this task and tuning the results is far more efficient than copywriting from scratch. This doesn’t mean you can get rid of your copywriting team: It means they can tune and optimize rather than write the entirety of the content.

  • Product Data Augmentation: One of the most time-intensive elements for a product data team is turning Excel spreadsheets and PDF spec sheets into specification metadata. Analyzing documents that already exist, are semi-structured, and are rich with product metadata, and using AI to parse and normalize them can significantly streamline the product data onboarding process. This is not generative AI, as it is not generating new data, but it is removing the need and the points of failure of a human performing that task.

    This is especially true for distributors, where differing file formats, data controls, taxonomies, and vocabularies cause resource-heavy efforts to transfer this data from spreadsheets and PDFs into systems. AI can scan these documents and augment the specification data, with limited human intervention for review and approval, leading to a smoother data onboarding process with fewer points of failure.

  • Image Validation: Ensuring the accuracy and quality of product images is essential for building trust with customers. Image validation technology can verify that product images meet the required standards, such as resolution, color accuracy, and visual clarity. Detecting logos, drop shadows, and even determining when a human shape is present in the image can make the difference between an image that a channel partner will reject multiple times and an image that they will accept the first time.

Benefits of Using AI in Your Product Data Program

  • Time Efficiency: These advanced technologies are capable of generating product data and validating images swiftly, eliminating the need for manual content creation and image quality checks. This allows your team to focus on more strategic aspects of your e-commerce business.
  • Cost-Effective: Reducing the need for hiring copywriters or content creators and manual image validation can significantly cut down on operational costs.
  • Consistency: Generative AI and retrieval-augmented generation generate content that maintains a consistent tone and style, ensuring a unified brand voice across your product listings, while image validation technology ensures visual consistency.
  • Scalability: As your product catalog expands, the scalability of these technologies allows for the efficient generation of data for numerous products and the validation of many images in a short amount of time.
  • Content Variation: Generative AI with retrieval-augmented generation can generate various versions of product descriptions and titles, helping you test and optimize content for better conversion rates, and image validation ensures the diversity and accuracy of product images.

Investing in AI for your Product Data Program

It goes without saying that manufacturers, distributors, marketplaces, and retailers should be investing in Generative AI, as this technology can automate elements of a business that, up until now, have remained resource-intensive and costly. However, making the right investment is key. Investing in AI for your product data practice is no different.

Understanding the difference between your needs for generative AI, data augmentation through AI, and data validations through AI is important to ensure that you are making the right investment. Not all AI needs to be generative to be valuable, and automating data augmentation and validation can have just as high a return on investment as automating the generation of your product descriptions.

EIS can help with this journey by evaluating the current state of your product data, optimizing onboarding processes, and determining where AI interventions will have the greatest impact.  Contact EIS at to talk to us about how EIS can assist you in incorporating AI into your product data program.

Meet the Author
Earley Information Science Team

We're passionate about managing data, content, and organizational knowledge. For 25 years, we've supported business outcomes by making information findable, usable, and valuable.