Oracle PDQ: Data Quality, Analysis & Cleansing for Management & Integration

We’re several months into a large project implementing Oracle PDQ (Product Data Quality) for a major online retailer.  This is the first of what will be several updates on what we expect to be an exciting and interesting journey going forward.  I will start this series by providing a little background on the tool we are using (Oracle PDQ) as well as some key points that are driving this project. 

First, let me begin by bringing readers up to speed on Oracle PDQ.  This is a product (which consists of a number of integrated modules) that addresses ALL of the issues associated with the b2b and b2c content supply chain, including –

  • cleansing content from suppliers – correcting “noisy” content by normalizing semantic variance - misspellings, truncations, abbreviations etc. - implementing editorial standards, and identifying missing content
  • setting limits on numerical values for attribute values (e.g. a belt for men’s pants has a limit of 60”), recognizing numbers into types of numbers – is it a decimal, is it an integer, or can it be either - and enabling calculations
  • recognizing products, their attributes and their attribute values – we do this through semantic pattern recognition
  • organizing product content (i.e. taxonomy/hierarchy)
  • defining a product and its associated attribute data (i.e. a node in the product taxonomy) and weighting the attributes
  • outputting cleansed content for all customer touchpoints (web, POS, in-store, catalogues, direct mail etc.)
  • mapping products to one or multiple  taxonomies, including site taxonomies

Pretty heady stuff—and we are having some fun with it.

What Retailers Need (and why they need it)

For any major retailer, regardless of whether they are selling to their customers in stores or online, the product content supply begins with the supplier of the product.  Supplier product information is basically, originally, unstructured content, that gets stored in a database so that it becomes structured content.  Except … that it DOES NOT “become” structured content … it is still text, but now it is text wrapped in in a model of rows and columns.  And still carries all the characteristics of text content.

So, first let’s lightly sketch out the context of the content supply chain opportunities and issues that all retailers – online and stores-based – face.  Retailers are required to manage large-scale and complex content supply chains that begin far upstream from their customers with supplier content and data about all the products that the retailer chooses to sell.  That content is usually highly variable in its quality, how it aligns with the retailer’s own content requirements and standards, and its completeness.  Now, the retailer’s customer does not care about (and does not know about) this content supply chain that has to be mastered.  Not their problem.  But it is our problem and one that is well addressed by Oracle PDQ.

Buying online is now ubiquitous, encroaching into more product areas, and is rapidly moving towards being a “part of ordinary life” that is not given a second thought by customers.  For example, last week I bought some herbs and spices online.  Simply, it was the easiest way for me to do it.  And let’s just mention in passing the huge momentum from smart mobile devices, pressing on best price and content quality.  Accurate and complete product information is no longer a nice-to-have for retailers.  It is now, simply, the basic point of entry to have a chance to reach the minds of customers.

And let’s consider for a moment how customers compare and decide.  The customer definitely decides on – price, availability, and “intangibles” around both the brand of the product and the brand of the retailer.  However—and this is a big “however” – customers give a large proportion of the decision to buy to the attributes of the product – its color, material, size and shape, sustainability certifications, style, and so on.  Again, without them caring and knowing that retailers, and information scientists who work for retailers, are busying themselves diligently (and sometimes frantically) to collect and present the full set of appropriate attribute values to customers online.  Cleansing and presenting attribute and attribute value content is a core functionality of Oracle PDQ.  So, if a retailer’s customer searches for an apparel item that they want in black, but some of that retailer’s suppliers are entering the color as “BLK” or “Blck” … what happens?  And the answer is … the customer may VERY LIKELY go and shop somewhere else online.  Because … it APPEARS that this particular retailer does not stock black for this product.  Where, in fact they do!  It’s just black under another name.  And that does NOT WORK in the online shopping world.

It is no large leap to come together and realize that this content supply chain is the space between the rock and hard place in the retailer’s world.

The Role of Taxonomy

Our team of taxonomists is playing a crucial role in this process as they guide the client through territory and tools that are new to them.  As taxonomists, we know all about building definitions of product and product families into a model of meaningful hierarchy.  We “know” that the retailer’s data model of products and all their attributes is exactly the same as a faceted taxonomy.  We know all about words – words and their meanings and meanings and their words – and so know exactly how to work with semantic recognitions tools like Oracle PDQ to build rules identifying attributes and attribute value patterns in supplier content. 

Since attention to methodology is one of our strong points we have built new methodologies to lay out the best approach to taking this kind of content into this kind of tool to meet and surpass business requirements.  It takes a village …  to … implement any new tool in a large corporation.  Our plan is to hand over to them a near-perfect Oracle PDQ implementation at the cusp of maintainable maturity.  And, lastly, documentation and knowledge transfer.  Since manuals and training materials for any enterprise application naturally do not contain any of the wealth of the client-specific methods, processes and choices, we are documenting all of that.

It is a big deal.  There are many stories to tell about implementing Oracle PDQ in a retail environment – and far too many to tell here.  So, watch this space over the weeks/months to come for the stories of business analysis, methodology development, formal knowledge transfer and more.

Earley Information Science Team

We're passionate about enterprise data and love discussing industry knowledge, best practices, and insights. We look forward to hearing from you! Comment below to join the conversation.