All Posts

The Power of Metadata - What the NSA Can Teach Enterprise Big Data

Metadata in the news?  Who would have thought that metadata would be explained to the public through mainstream media outlets?   Nearly everyone is now familiar with the National Security Agency’s data (and metadata) collection efforts, as revealed by Edward Snowden and others. Consequently, the average person  has become aware of “metadata”, even if they don’t fully understand. As information professionals, this provides us with a unique opportunity to educate our business management and colleagues.

I won’t take a position about what is right and what is wrong with the NSA program; however, I will highlight four characteristics of metadata capture implicit in the NSA conversation.  Each of these points is important to communicate to your company leaders:

  1. Metadata is generated every moment of every day by many activities and devices
  2. Patterns derived from metadata can provide significant insight and high-value functionality to users of consumer and corporate applications
  3. Metadata  collection requires sensitivity to privacy concerns
  4. Conclusions from metadata can be error-prone and subject to misuse

Metadata throughout daily activities

The Wall Street Journal in “Phones Leave a Telltale Trail,” recently described how robbery suspects were apprehended using mobile phone records that showed the movement of suspects phones along the getaway route.  “Each individual crumb may seem insignificant, but combined and analyzed, this data gives police and spies alike one of the most powerful investigative tools ever devised.”

The take-away here is that more and more transactions create digital markers.  From email messages, to phone calls, photos taken, web sites visited, mobile apps, credit card transactions, and shopping purchases, metadata is captured, as transaction details.

Understanding the value of metadata for analysis and future decision-making is critical to competitive survival.   As you engage in new projects, make sure there is an explicit focus on metadata capture requirements; and that the metadata value is surfaced in discussions with business leadership, even if it is not yet a quantifiable part of ROI.

Patterns in metadata provide value

Certainly when we let Yelp know our location, there is immediate value by getting recommendations that are close by.   Critical patterns can be recognized in individual behaviors; but more importantly, when we start to recognize patterns across many individuals, we can draw more elevated conclusions about categories of customers.  “What are the characteristics of customers likely to be brand advocates through social media?”

Much of this type of modeling is going on in our Information Architecture projects and programs for clients and is at the core of developing enterprise taxonomies and metadata schemas.  This field is still in its infancy in many enterprises and there is a tremendous amount of productivity to be gained in organizations that take categorization of data seriously, particularly as they bring in big data capabilities.

Sensitivity to privacy concerns

Most of us disclose and share more data than we may realize or want to: When we tell an application it can use our location; when we download a game on our Smartphone; when we shop on line. I am not a privacy expert and there are nuances to these issues.

What does “private” mean when an organization you do business with collects your data?  Privacy policies are by no means uniform; and even in the best of circumstances, subject to interpretation and challenges from new collection techniques.

There are currently available approaches that can identify personal characteristics of users with striking accuracy just by analyzing the types of applications that are downloaded onto a smartphone.  New tools are entering the workplace that are mining communications and interactions to determine how well the organization is collaborating and what activities (and perhaps by extension, what people) are most important. Other tools are analyzing social networks and social graph data to recommend people, content, products and activities of interest to the user.  But these same applications may one day be integrated with other online sources along with commercially available credit and financial data and perhaps provide much more information than any of us had intended to share with certain entities. 

The take-away here is that our expectations about what constitutes privacy are evolving and not that clear in the digital world.  As businesses collect more data on the digital footprints of customers and employees, they need to be proactive in understanding evolving privacy expectations; and even more importantly, need to understand the potential for backlash as they explore new uses where privacy guidelines have not yet evolved.

Conclusion can be error-prone and misused

Finding signal in noise is a double-edged sword.  The US Government devotes significant energy, money and resources to fighting terrorism and keeping the public safe.  If we did not leverage the power of information management tools to surface suspicious patterns and find the bad guys, and something did happen, the public and elected officials would loudly and rightfully complain that we missed things that could have been caught with state of the art tools.  But what about the errors that will be made?   There are always False Positives in any analysis.  Over time, will calls from my off-shore clients and development partners make me look like a suspect?

The take-away here is that as analysis of big data becomes critical to business, we need to build in validation practices before we do damage to individuals or to our business.   Moreover, it’s an old saw that any case can be made through statistics.  Validation needs to be built into our processes so that key decisions are not swayed by subjectively-influenced analytic results.

In summary, metadata is a fact of life and permeates every activity and interaction – many even when we are not directly online (think sensor data in the environment).   Metadata has enormous value and utility and is being harnessed in new ways internally in enterprises by providing information in context and is improving our online experiences.  It is also making our lives safer and richer and is impacting every sector of the economy.  The NSA activities are putting a spotlight on one aspect of metadata and data collection.   We’ll be seeing many more of these kinds of issues being raised in the near future.  

For a look into how we use information architecture as the foundation for digital transformation read our whitepaper: "Knowledge is Power: Context-Driven Digital Transformation

Seth Earley
Seth Earley
Seth Earley is the Founder & CEO of Earley Information Science and the author of the award winning book The AI-Powered Enterprise: Harness the Power of Ontologies to Make Your Business Smarter, Faster, and More Profitable. An expert with 20+ years experience in Knowledge Strategy, Data and Information Architecture, Search-based Applications and Information Findability solutions. He has worked with a diverse roster of Fortune 1000 companies helping them to achieve higher levels of operating performance.

Recent Posts

[Earley AI Podcast] Episode 31: Kirk Marple

It’s All About the Data Guest: Kirk Marple

[Earley AI Podcast] Episode 30: Alex Babin

The Holy Grail of AI Guest: Alex Babin

The Critical Element of Foundational Architecture

Recently I chaired the Artificial Intelligence Accelerator Institute Conference in San Jose – in the heart of Silicon Valley.  The event has brought together industry innovators from both large and small organizations, providing a wide range of perspectives. For example, the CEO of AI and ML testing startup of Kolena, Mohamed Elgendy and Srujana Kaddevarmuth, Senior Director, Data & ML Engineering, Customer Products, Walmart Global Tech discussed productization of AI solutions and ways to increase adoption. I especially liked the idea of a model catalogue from which data scientists can retrieve data sets and machine learning models that others have built rather than starting from scratch.