All Posts

Data Quality Metrics - What's your Aardvark Index?

Back in 2016 residents of a property an hour’s drive from Wichita, Kansas, were suffering through some very strange and unpredictable activities. They were visited by federal agents, ambulances, police, trespassers, and harassers. They were equated with criminals for over 10 years, with no idea why.

They were the victims of default values.

This rural property, with a nearest neighbor a mile away, happens to be located very near the literal middle of the United States. If you put a pin dead-center on a USA map, that pin would touch this property in the town of Potwin. So when Massachusetts company MaxMind began collecting and selling digital mapping services, mapping IP address to GPS coordinates, they used those middle-of-the-country coordinates as a default for any IP address it couldn’t place nationally. The result:

600 million IP addresses were associated with the front yard of James and Theresa Arnold's house.

Here's an article that describes the problem the family had:

The Maddening Story Of The Kansas House Blamed For Everything

At Earley we’ve affectionately referred to the magnitude of these problems using an Aardvark index. This metric is so-named because one of our customers had once assigned a default term list to a mandatory input field. Whenever a user uploaded a document but chose to skip (or skimp) assigning a meaningful value to that field, the platform automatically tagged the document with the alphabetically first term. By the time they realized this was happening, a majority of their content was tagged with the word—yes, you guessed it—aardvark. Going forward, the company tracked the Aardvark index as a way of measuring data quality. Their goal: to get to zero.

What's nice about the aardvark index, in any incarnation, is that it becomes a metric that can be observed and tracked over time. It also has a built-in social capital. It's fun. People are interested in watching a value like this go up and down, whether for business reasons or personal.

We all know that data quality matters, but getting traction is hard work. Often there is no Aardvark index, and when there is, it's already something troubling. Knowing which data values matter, and then capturing the attention of stakeholders and management at the same time, is a hard thing to do. It tends not to happen organically, but requires a concerted effort by data stewards. And it's an effort that not enough companies make.

The actual Aardvark index could've been avoided had something other than a generic term list been thrown into the mix. This was a decision taken too lightly. The situation with IP mapping, however, is harder to avoid. The folks in Potwin, Kansas were not consulted, and why would they be? They aren't stakeholders. But for most of the things that we do, we know our stakeholders, and we know our data, or at least what our data are supposed to be. In other words, most of the time we do not have any excuse for developing a scenario that lends itself to the aardvark index.

Truth is, most companies don't have a governance framework in place until it is too late. Most people don't realize that they're stakeholders, let alone stewards. Planning mistakes like these are allowed to happen because no one is looking, let alone accountable. Governance is an afterthought, an activity unlikely to happen, a non-requirement without meaningful consequences.

We believe that governance – the people part of business – has to come first, at least at a high level. Sustainability is a core requirement: your business needs not only change management policies and procedures, but also a willingness to build change management into its designs and plans. And while culture change is hard, thankfully there's always a good place to start: the Aardvark indices already solidly in place. Because you know they're there, just as sure as you know you don't really want to see them.

New call-to-action

Recent Posts

First Party Data: The New Imperative

The need for accurate data to support digital transformation and the emergence of new restrictions on the use of third-party data have prompted many companies to focus their attention on first party data.

Knowledge Graphs, a Tool to Support Successful Digital Transformation Programs

Knowledge graphs are pretty hot these days. While this class of technology is getting a lot of market and vendor attention these days, it is not necessarily a new construct or approach. The core principles have been around for decades. Organizations are becoming more aware of the potential of knowledge graphs, but many digital leaders are puzzled as to how to take the next step and build business capabilities that leverage this technology.

[RECORDED] Powering Personalized Search with Knowledge Graphs

Transforming Legacy Faceted Search into Personalized Product Discovery The latest in e-commerce trends is the transformation of legacy faceted search into a more personalized experience. By applying semantic reasoning over a knowledge graph, contextual information about a customer can be combined with product data, delivering relevant search results tailored to them. The first half of this webinar is designed for the business executive. We’ll focus on why personalized search is an essential e-commerce ingredient. And we’ll demystify the process of implementing a more personalized product discovery experience for your customers. The second half of the webinar is designed for the data strategist. We’ll cover the data modeling required to build knowledge graphs for successful personalized search. We’ll discuss some real-world cases and cover the steps you can take to get started. Who should attend: Executives who care about e-commerce and the data experts who enable them. Speakers: