All Posts

Oracle PDQ: Data Quality, Analysis & Cleansing for Management & Integration

We’re several months into a large project implementing Oracle PDQ (Product Data Quality) for a major online retailer.  This is the first of what will be several updates on what we expect to be an exciting and interesting journey going forward.  I will start this series by providing a little background on the tool we are using (Oracle PDQ) as well as some key points that are driving this project. 

First, let me begin by bringing readers up to speed on Oracle PDQ.  This is a product (which consists of a number of integrated modules) that addresses ALL of the issues associated with the b2b and b2c content supply chain, including –

  • cleansing content from suppliers – correcting “noisy” content by normalizing semantic variance - misspellings, truncations, abbreviations etc. - implementing editorial standards, and identifying missing content
  • setting limits on numerical values for attribute values (e.g. a belt for men’s pants has a limit of 60”), recognizing numbers into types of numbers – is it a decimal, is it an integer, or can it be either - and enabling calculations
  • recognizing products, their attributes and their attribute values – we do this through semantic pattern recognition
  • organizing product content (i.e. taxonomy/hierarchy)
  • defining a product and its associated attribute data (i.e. a node in the product taxonomy) and weighting the attributes
  • outputting cleansed content for all customer touchpoints (web, POS, in-store, catalogues, direct mail etc.)
  • mapping products to one or multiple  taxonomies, including site taxonomies

Pretty heady stuff—and we are having some fun with it.

What Retailers Need (and why they need it)

For any major retailer, regardless of whether they are selling to their customers in stores or online, the product content supply begins with the supplier of the product.  Supplier product information is basically, originally, unstructured content, that gets stored in a database so that it becomes structured content.  Except … that it DOES NOT “become” structured content … it is still text, but now it is text wrapped in in a model of rows and columns.  And still carries all the characteristics of text content.

So, first let’s lightly sketch out the context of the content supply chain opportunities and issues that all retailers – online and stores-based – face.  Retailers are required to manage large-scale and complex content supply chains that begin far upstream from their customers with supplier content and data about all the products that the retailer chooses to sell.  That content is usually highly variable in its quality, how it aligns with the retailer’s own content requirements and standards, and its completeness.  Now, the retailer’s customer does not care about (and does not know about) this content supply chain that has to be mastered.  Not their problem.  But it is our problem and one that is well addressed by Oracle PDQ.

Buying online is now ubiquitous, encroaching into more product areas, and is rapidly moving towards being a “part of ordinary life” that is not given a second thought by customers.  For example, last week I bought some herbs and spices online.  Simply, it was the easiest way for me to do it.  And let’s just mention in passing the huge momentum from smart mobile devices, pressing on best price and content quality.  Accurate and complete product information is no longer a nice-to-have for retailers.  It is now, simply, the basic point of entry to have a chance to reach the minds of customers.

And let’s consider for a moment how customers compare and decide.  The customer definitely decides on – price, availability, and “intangibles” around both the brand of the product and the brand of the retailer.  However—and this is a big “however” – customers give a large proportion of the decision to buy to the attributes of the product – its color, material, size and shape, sustainability certifications, style, and so on.  Again, without them caring and knowing that retailers, and information scientists who work for retailers, are busying themselves diligently (and sometimes frantically) to collect and present the full set of appropriate attribute values to customers online.  Cleansing and presenting attribute and attribute value content is a core functionality of Oracle PDQ.  So, if a retailer’s customer searches for an apparel item that they want in black, but some of that retailer’s suppliers are entering the color as “BLK” or “Blck” … what happens?  And the answer is … the customer may VERY LIKELY go and shop somewhere else online.  Because … it APPEARS that this particular retailer does not stock black for this product.  Where, in fact they do!  It’s just black under another name.  And that does NOT WORK in the online shopping world.

It is no large leap to come together and realize that this content supply chain is the space between the rock and hard place in the retailer’s world.

The Role of Taxonomy

Our team of taxonomists is playing a crucial role in this process as they guide the client through territory and tools that are new to them.  As taxonomists, we know all about building definitions of product and product families into a model of meaningful hierarchy.  We “know” that the retailer’s data model of products and all their attributes is exactly the same as a faceted taxonomy.  We know all about words – words and their meanings and meanings and their words – and so know exactly how to work with semantic recognitions tools like Oracle PDQ to build rules identifying attributes and attribute value patterns in supplier content. 

Since attention to methodology is one of our strong points we have built new methodologies to lay out the best approach to taking this kind of content into this kind of tool to meet and surpass business requirements.  It takes a village …  to … implement any new tool in a large corporation.  Our plan is to hand over to them a near-perfect Oracle PDQ implementation at the cusp of maintainable maturity.  And, lastly, documentation and knowledge transfer.  Since manuals and training materials for any enterprise application naturally do not contain any of the wealth of the client-specific methods, processes and choices, we are documenting all of that.

It is a big deal.  There are many stories to tell about implementing Oracle PDQ in a retail environment – and far too many to tell here.  So, watch this space over the weeks/months to come for the stories of business analysis, methodology development, formal knowledge transfer and more.

Earley Information Science Team
Earley Information Science Team
We're passionate about enterprise data and love discussing industry knowledge, best practices, and insights. We look forward to hearing from you! Comment below to join the conversation.

Recent Posts

Use Customer and Behavior Data To Create Personalized Experiences

The more quickly customers can find the product they are seeking, the more likely they are to complete a transaction and to return to the site in the future. Personalizing offers and making well- targeted recommendations can bring customers and products together faster, and are effective ways to engage customers by creating a more positive customer experience. In order to do this, companies need to capture and use as much relevant information as possible. The more that is known about the customer, the more effectively the recommendation system works. Customers generate many signals through their online behavior, and those signals can also be used to understand their interests, purchasing patterns, and needs. Reading their digital body language accurately and creating a valid customer model is essential to anticipating and fulfilling those needs.

How to Instrument KPIs Throughout the Customer Journey

You're probably using metrics to determine if your marketing programs are effective. But, have you selected the right metric at each stage of the customer journey?  Which ones connect to your strategic goals? In this session Seth Earley and Allison Brown talk about how each stage of the journey can be instrumented to use feedback from course corrections to further improve the process. You'll learn: Types of operational and user experience metrics and KPI’s How to select and collect the right metric for each stage of the customer journey How KPIs can be used for data-driven decisions How to manage conflicting goals and metrics

First Party Data - Managing and Monetizing the "Data Exhaust" From Your MarTech Stack

Understanding, anticipating and responding to the wants, needs and behaviors of your customer is the competitive battlefield of 2022. However, with new limitations and regulations regarding second and third-party data and tracking cookies, marketers, digital leaders and ecommerce executives have to consider their own methods of collecting and acting upon the data they gather about customers. In this webinar Seth Earley will talk with industry experts about how you need to model, collect, normalize, organize, manage, analyze, and act on customer information. The time to do so is now and we’ll discuss practical ways to move the needle on customer data, customer analytics and orchestration of the customer experience.