Growth Series BLOG

Its a Premium BLOG template and it contains Instagram Feed, Twitter Feed, Subscription Form, Blog Search, Image CTA, Topic filter and Recent Post.

All Posts

Oracle PDQ: Data Quality, Analysis & Cleansing for Management & Integration

We’re several months into a large project implementing Oracle PDQ (Product Data Quality) for a major online retailer.  This is the first of what will be several updates on what we expect to be an exciting and interesting journey going forward.  I will start this series by providing a little background on the tool we are using (Oracle PDQ) as well as some key points that are driving this project. 

First, let me begin by bringing readers up to speed on Oracle PDQ.  This is a product (which consists of a number of integrated modules) that addresses ALL of the issues associated with the b2b and b2c content supply chain, including –

  • cleansing content from suppliers – correcting “noisy” content by normalizing semantic variance - misspellings, truncations, abbreviations etc. - implementing editorial standards, and identifying missing content
  • setting limits on numerical values for attribute values (e.g. a belt for men’s pants has a limit of 60”), recognizing numbers into types of numbers – is it a decimal, is it an integer, or can it be either - and enabling calculations
  • recognizing products, their attributes and their attribute values – we do this through semantic pattern recognition
  • organizing product content (i.e. taxonomy/hierarchy)
  • defining a product and its associated attribute data (i.e. a node in the product taxonomy) and weighting the attributes
  • outputting cleansed content for all customer touchpoints (web, POS, in-store, catalogues, direct mail etc.)
  • mapping products to one or multiple  taxonomies, including site taxonomies

Pretty heady stuff—and we are having some fun with it.

What Retailers Need (and why they need it)

For any major retailer, regardless of whether they are selling to their customers in stores or online, the product content supply begins with the supplier of the product.  Supplier product information is basically, originally, unstructured content, that gets stored in a database so that it becomes structured content.  Except … that it DOES NOT “become” structured content … it is still text, but now it is text wrapped in in a model of rows and columns.  And still carries all the characteristics of text content.

So, first let’s lightly sketch out the context of the content supply chain opportunities and issues that all retailers – online and stores-based – face.  Retailers are required to manage large-scale and complex content supply chains that begin far upstream from theirs customers with supplier content and data about all the products that the retailer chooses to sell.  That content is usually highly variable in its quality, how it aligns with the retailer’s own content requirements and standards, and its completeness.  Now, the retailer’s customer does not care about (and does not know about) this content supply chain that has to be mastered.  Not their problem.  But it is our problem and one that is well addressed by Oracle PDQ.

Buying online is now ubiquitous, encroaching into more product areas, and is rapidly moving towards being a “part of ordinary life” that is not given a second thought by customers.  For example, last week I bought some herbs and spices online.  Simply, it was the easiest way for me to do it.  And let’s just mention in passing the huge momentum from smart mobile devices, pressing on best price and content quality.  Accurate and complete product information is no longer a nice-to-have for retailers.  It is now, simply, the basic point of entry to have a chance to reach the minds of customers.

And let’s consider for a moment how customers compare and decide.  The customer definitely decides on – price, availability, and “intangibles” around both the brand of the product and the brand of the retailer.  However—and this is a big “however” – customers give a large proportion of the decision to buy to the attributes of the product – its color, material, size and shape, sustainability certifications, style, and so on.  Again, without them caring and knowing that retailers, and information scientists who work for retailers, are busying themselves diligently (and sometimes frantically) to collect and present the full set of appropriate attribute values to customers online.  Cleansing and presenting attribute and attribute value content is a core functionality of Oracle PDQ.  So, if a retailer’s customer searches for an apparel item that they want in black, but some of that retailer’s suppliers are entering the color as “BLK” or “Blck” … what happens?  And the answer is … the customer may VERY LIKELY go and shop somewhere else online.  Because … it APPEARS that this particular retailer does not stock black for this product.  Where, in fact they do!  It’s just black under another name.  And that does NOT WORK in the online shopping world.

It is no large leap to come together and realize that this content supply chain is the space between the rock and hard place in the retailer’s world.

Our team of taxonomists is playing a crucial role in this process as they guide the client through territory and tools that are new to them.  As taxonomists, we know all about building definitions of product and product families into a model of meaningful hierarchy.  We “know” that the retailer’s data model of products and all their attributes is exactly the same as a faceted taxonomy.  We know all about words – words and their meanings and meanings and their words – and so know exactly how to work with semantic recognitions tools like Oracle PDQ to build rules identifying attributes and attribute value patterns in supplier content. 

Since attention to methodology is one of our strong points we have built new methodologies to lay out the best approach to taking this kind of content into this kind of tool to meet and surpass business requirements.  It takes a village …  to … implement any new tool in a large corporation.  Our plan is to hand over to them a near-perfect Oracle PDQ implementation at the cusp of maintainable maturity.  And, lastly, documentation and knowledge transfer.  Since manuals and training materials for any enterprise application naturally do not contain any of the wealth of the client-specific methods, processes and choices, we are documenting all of that.

It is a big deal.  There are many stories to tell about implementing Oracle PDQ in a retail environment – and far too many to tell here.  So, watch this space over the weeks/months to come for the stories of business analysis, methodology development, formal knowledge transfer and more.

Earley Information Science Team
Earley Information Science Team
We're passionate about enterprise data and love discussing industry knowledge, best practices, and insights. We look forward to hearing from you! Comment below to join the conversation.

Recent Posts

Designing AI Programs for Success - a 4 Part Series

Recorded - available as on demand webcast AI is plagued by inflated and unrealistic expectations due to a lack of broad understanding of this wide-ranging space by software vendors and customers. Software tools can be extremely powerful, however the services, infrastructure, data quality, architecture, talent and methodologies to fully deploy in the enterprise are frequently lacking. This four-part series by Earley Information Science and Pandata will explore a number of issues that continue to afflict AI projects and reduce the likelihood of success. The sessions will provide actionable steps using proven processes to improve AI program outcomes.

The Missing Ingredient to Digital Transformation: Scaling Knowledge Communities and Processes

The holy grail of digital transformation is the seemingly conflicting goals of high levels of customer service and pressure to reduce costs. “Digital Transformation” has become an all-encompassing term – in a piece in this column about customer data platforms, I asked whether the term has lost its meaning: The phrase “digital transformation” can mean anything and everything — tools, technology, business processes, customer experience, or artificial intelligence, and every buzzword that marketers can come up with. Definitions from analysts and vendors include IT modernization and putting services online; developing new business models; taking a “digital first” approach; and creating new business processes, and customer experiences. The overarching objective of a digital transformation program is to improve end-to-end efficiencies, remove friction from information flows, and create new value streams that differentiate a company’s offerings and strengthen the customer relationship. Having assisted large global enterprises with building the data architecture, supporting processes, and governance for multiple digital transformations, in my experience, there are two broad classes of initiatives that seem to get funding and others that miss the boat in terms of time, attention, and resources.

4 Reasons B2B Manufacturers need Strong Product Data

There are many manufacturers who have started to take the leap forward in the digital space, but there are still a great number who rely solely on their distributors to manage their product data. We are going to look at 4 key reasons why its so important that manufacturers own their product and dedicate the time and resources to build it out.