Data Quality Metrics - What's your Aardvark Index?

Back in 2016 residents of a property an hour’s drive from Wichita, Kansas, were suffering through some very strange and unpredictable activities. They were visited by federal agents, ambulances, police, trespassers, and harassers. They were equated with criminals for over 10 years, with no idea why.

They were the victims of default values.

This rural property, with a nearest neighbor a mile away, happens to be located very near the literal middle of the United States. If you put a pin dead-center on a USA map, that pin would touch this property in the town of Potwin. So when Massachusetts company MaxMind began collecting and selling digital mapping services, mapping IP address to GPS coordinates, they used those middle-of-the-country coordinates as a default for any IP address it couldn’t place nationally. The result:

600 million IP addresses were associated with the front yard of James and Theresa Arnold's house.

Here's an article that describes the problem the family had:

The Maddening Story Of The Kansas House Blamed For Everything

At Earley we’ve affectionately referred to the magnitude of these problems using an Aardvark index. This metric is so-named because one of our customers had once assigned a default term list to a mandatory input field. Whenever a user uploaded a document but chose to skip (or skimp) assigning a meaningful value to that field, the platform automatically tagged the document with the alphabetically first term. By the time they realized this was happening, a majority of their content was tagged with the word—yes, you guessed it—aardvark. Going forward, the company tracked the Aardvark index as a way of measuring data quality. Their goal: to get to zero.

What's nice about the aardvark index, in any incarnation, is that it becomes a metric that can be observed and tracked over time. It also has a built-in social capital. It's fun. People are interested in watching a value like this go up and down, whether for business reasons or personal.

We all know that data quality matters, but getting traction is hard work. Often there is no Aardvark index, and when there is, it's already something troubling. Knowing which data values matter, and then capturing the attention of stakeholders and management at the same time, is a hard thing to do. It tends not to happen organically, but requires a concerted effort by data stewards. And it's an effort that not enough companies make.

The actual Aardvark index could've been avoided had something other than a generic term list been thrown into the mix. This was a decision taken too lightly. The situation with IP mapping, however, is harder to avoid. The folks in Potwin, Kansas were not consulted, and why would they be? They aren't stakeholders. But for most of the things that we do, we know our stakeholders, and we know our data, or at least what our data are supposed to be. In other words, most of the time we do not have any excuse for developing a scenario that lends itself to the aardvark index.

Truth is, most companies don't have a governance framework in place until it is too late. Most people don't realize that they're stakeholders, let alone stewards. Planning mistakes like these are allowed to happen because no one is looking, let alone accountable. Governance is an afterthought, an activity unlikely to happen, a non-requirement without meaningful consequences.

We believe that governance – the people part of business – has to come first, at least at a high level. Sustainability is a core requirement: your business needs not only change management policies and procedures, but also a willingness to build change management into its designs and plans. And while culture change is hard, thankfully there's always a good place to start: the Aardvark indices already solidly in place. Because you know they're there, just as sure as you know you don't really want to see them.

New call-to-action

Meet the Author
Seth Maislin