Extracting Insights from Big Data – Building a Knowledge Architecture

This Article originally appeared on CMSWire.com.

Every CIO has heard about the promise of Big Data – gleaning new insights from customers, markets and competitors, speeding time to market, detecting anomalies and security outliers and providing the foundation for machine learning algorithms that form the precursor of Artificial Intelligence. The challenge is getting from “here” to “there”. There’s lots of promise and potential but there are still many challenges and roadblocks that stand in the way. This Articles discusses the “knowledge architecture” of the enterprise and ways to develop and apply it in the short term that will lead to new capabilities in the long run.

“Big data is like teenage sex:...

"everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...” Dan Ariely, Author of “Predictably Irrational”

I might not go so far as to say no one is doing it--many organizations are getting value from big data. The ones who are “doing it best” are the ones that understand their objectives, have identified the questions they want to ask, and have done a good job of architecting their data and process to produce and act on insights. This begins with identifying what data you have and understanding its value. The challenge is that most organizations are not aware of the full scope of data sources at their disposal and don’t consider how that data can be applied in order to produce value for the enterprise. Consider the data “exhaust” thrown off by marketing applications. Marketing integration suites capture a huge variety of behaviors and interactions. If that information about the user’s “electronic body language” is not instrumented into content processes and engagement strategies, it goes to waste.

Double Loop Learning – Pulling Levers versus Changing the Levers to Pull

The information doesn’t need to go to waste, but in order to use it well, some questions need to be answered: What you are trying to measure? Would you know it if you saw it? What would you do if you had it? These answers can be found in the data indicators related to the customer lifecycle and the learning and adaptation that result when acting on the data. These can be accomplished through “double loop learning”- the mechanisms of looping insights back to the organization for action. The first learning loop is the cycle of data collection, observation and intervention and then cycling back to data collection in order to see if the intervention had the desired impact. The second loop is a larger review of the process and macro objectives where lessons learned and interventions can change the hypothesis and data collection. The objective is the same; however, the lessons learned are taking place at two levels – macro and micro. At the micro level, we are making small adjustments to content, organizing principles, product groupings, associations, cross sell rules, upsell configurations and “pulling levers” based on the data. At the macro level, we are changing our hypothesis and the types of interventions that take place at the micro level (“changing the levers” that we are able to pull).

Clues from the User Vapor Trail

The “vapor trail” of user behavior data can tell us that a piece of content, a relationship or merchandising/shopping attributes are not effective – they are not leading to clicks, conversions, or downloads or whatever behavior or response we are trying to achieve. There might be several smaller adjustments to the content and product data relationships or ways to surface the content where the impact can be measured. However, if an entire program or engagement process is not effective, then the metrics collected might lead to a significant revamp that requires development, testing, quality review and deployment. This process can take months from start to finish. The clock speed and intervention action of these loops is vastly different.

These scenarios depend on identification of a hypothesis about the correlation of observed data and the action. But what if we don’t have a clear understanding of the outcome or correlation? There can be more variables than we can easily comprehend or the volume of interactions can be too large to make such changes manually.

Adjusting content, product data and promotions for large enterprises at scale requires more automated approaches, but these approaches still require metrics, objectives, and hypotheses as part of the feedback loops that machine learning algorithms can leverage.

Back to Big (and Large) Data

But what about the definition of big data? We have all heard the Volume, Variety and Velocity definition. What does that really mean when it comes to extracting value?

There are large data sources and big data sources. Large data might be transaction processing output from a large retailer. Such data is well structured, of good quality and with a defined schema – that is, all of the elements are understood, consistent, and normalized so that apples to apples comparisons can be made. In contrast, big data that streams in from a variety of sources might contain different definitions of a customer – one defined as a family and the other as an individual. In that case, it will be hard to compare results. Moreover, in a big data scenario, the volume could be so great that conventional hardware and software takes too long to perform analyses or becomes too expensive to scale, even if the data is clean and well-structured.

Big data technologies help address this particular aspect by running the analyses on lower cost commodity hardware and splitting processing up into smaller jobs. But in general, the big data that people are really interested in is not well-formed, defined, clean, or consistent. The most interesting insights and new discoveries come from analyzing and combining disparate sources and processing these in a way that looks for patterns and correlations across data sets.

Time for a Nice Swim in the Data Lake

For example, weather and traffic condition impact a retailer’s sales. Those data feeds are coming from different systems in different formats, or potentially from different sources within a particular system. The systems may have different designs and conventions for naming the data elements. So-called “data lakes” allow these disparate types and structures to be stored in a repository without a predefined structures that traditional systems such as data warehouses require. We can keep adding data sources that consist primary of sensor data or text information – Tweet streams or remote traffic monitors – and use algorithms to process and analyze the information and look for patterns.

The other characteristic of big data that we need to be concerned with is how fast the data changes. Sensor data might stream all day long versus sales results that are processed at the end of the day as a batch. If the goal were to correlate sales with senor data in real time, the velocity of the sales data would increase. (Real time versus batch Data for a single product category changes more slowly than the cumulative transactions across all categories. There is an increase in the amount of data (the volume) but also an increase in how quickly it changes when sampled in real time. This is the velocity of the data. Considering that sensors can throw off continuous streams of data (perhaps monitoring not just vehicle traffic but pedestrian flows) and that there can be hundreds or thousands in a target region, and the velocity continues to increase.

Enter Omnichannel

An omnichannel campaign objective might be to understand the impact of traffic and weather on in store and mobile promotions and customer segments. The data might include sensor data from stores measuring pedestrian traffic, clickstream data on the retailer’s web site, and mobile data from third parties correlated with anonymized demographic data. This data is coming in very quickly, with new data points generated every second. We now have physical and behavioral data combined with traffic, weather, pedestrian traffic, mobile phone data, and we have that across dozens of demographic segments (the granularity of this data is scary – e.g., down to looking at “trend-setting soccer moms without college degrees interested in crafts,”) and we now add in our promotional campaigns for the spring.

What do we make of such a data stew? (Ah, a new buzzword – data stew). This is the point at which the data has to be processed in some way. Contrary to popular belief, big data may be messy but we do have to do something with it to make sense of the patterns. A set of hypotheses must be developed in order to select particular information for analysis. If we are looking for conversations about our products, there needs to be a definition of the product and the variants in how people will describe the product – including misspellings. If we want to know positive or negative sentiment, the system has to determine if there are variations specific to product characteristics that people will call out in a positive or negative way.

Since different systems will define sensor data parameters differently, those also need to be reconciled. The ways we process the data are part of a family of algorithms called machine learning. Machine learning looks for patterns, classifies data and content, and predicts patterns based on new data sets. But we need to know what questions to ask or know what to look for. In this way, big data is no different from BI and traditional analytics. It is possible to ask for anomalies or to look for outliers or to find patterns but what outliers and patterns? The conditions of low foot traffic on sunny days? The makeup of pedestrians on cloudy days when we are running our clearance sales? The characteristics of people who use promotions sent on iPhones who dislike the competitor products? The rate and conditions of product failures in harsh conditions? The frequency of products purchased as a part of high margin solutions correlated with customer segment and promotion strategy? The questions and possibilities are endless.

Our Lake can get Swampy

In order to make the data useable it cannot stay in the lake forever - unless the data is organized and extracted, the data lake will become a data swamp. Just as with BI and data warehousing, we eventually need to have reference data – the standard names of products, markets, customer types, promotion types, demographics, classes of outliers, classifications of data types and conditions, security and privacy constraints and types of learning algorithms and processing models. Data and metadata need to be catalogued and defined with the correct history, lineage, ownership, usage rights, and source information, and quality and accuracy assessments.

We need to organize the data in catalogues with meaningful business terminology. We need to harness the power of big data and machine learning with organizing principles that are the foundation of all analysis.

This is not a trivial task. In some cases, it is left to technical teams and in others to business teams, but they may not collaborate effectively or have a view of the larger picture. A contextual enterprise architecture is the scaffolding and framework for knowledge in the organization. Increasingly, that knowledge is gleaned from diverse sources and hidden in streams that continually flow through the organization.

Big data needs to be smarter, and contextualizing it adds the intelligence. Business users are looking for insights. Insights come from understanding patterns in data and seeing causal links and connections. We can start to predict the behaviors of customers and employees and the performance of products and promotions in the field by finding the cause and effect relationships between the things we have control of and the things we want to influence. The starting point is consistency of language, concepts and terminology – the knowledge architecture –supported and maintained by governance and ownership.