All Posts

5 Misconceptions About Data and AI Projects

This article by Seth Earley was originally published on MDM.COM.

Machine learning and AI programs run on data. The quality and reliability of that data is a critical ingredient to your formula for leveraging AI. The old “garbage in/garbage out” saying still applies no matter how advanced the algorithm.

There have been many misconceptions regarding AI that have impacted the success of these projects. For AI projects, having the correct “training data” is critical to a positive outcome. Many projects go over budget or are not completed on time due an underestimation of the time needed to train the algorithm or the inability to access the correct data.

Here are five misconceptions about data and AI projects:

1. The AI will fix the data

At the height of AI hype, many vendors of AI technology claimed that their algorithms could ingest data that was incomplete or of poor quality and were smart enough to find patterns and make predictions even if the data was in poor shape. This is simply not the case. It is true that some algorithms can help with data quality but those use cases are highly specific and still require the right “reference data” that the system could use to train and find or correct issues with operational data.

2. Point the AI to “all of the data” and it will find the correct solution

Context is as important for AI as it is for people. Just like people need to orient when looking for answers (you don’t look for iPhone solutions in a car repair manual) the data source for AI requires curation and context. If we are building a question-answering system for a consumer, it does not make sense to ingest complex engineering documents. When IBM trained Watson to play Jeopardy!, ingesting some data sources reduced performance. More data was not necessarily helpful. The program required carefully selected data.

3. Cognitive AI (chatbots and intelligent virtual assistants) can be deployed out of the box

There are some very limited use cases where a chatbot can be turned on out of the box. However, chat bots and IVAs need the same training that a human needs. You would never drop a new hire into a support role without training. The AI needs the same. Any meaningful functionality will be powered by your knowledge and data sources and those sources require the correct format and structure to be retrieved by a cognitive assistant. Chat bots are a channel – to knowledge, content and information.

4. AI Data issues can be solved by IT

In many projects, IT is left with addressing data problems that arise from business processes and business decisions. Imagine that salespeople will not enter data into a CRM system. That is not something that IT can solve since it is a business process issue. It cannot be simply outsourced to a low-cost offshore provider. Data needs to be owned by the business and support business goals. IT is the enabler but cannot own business data.

5. AI will eliminate the need for data governance

Data governance is more important than ever. What data is owned by the organization? What can be done with it? What are the data sources and how is it being consumed or translated by other systems and processes? How well is data being leveraged to produce value for the enterprise and the customer? How can data issues be addressed and remediated? The data infrastructure of the organization is essential. Investments need to be prioritized and results measured. Strong data governance helps get the organization’s data house in order.

The future belongs to organizations that can best merge their processes, business value and customer relationships with advance AI capabilities. Data is critical and in fact is more important than the algorithm. Getting your data house in order needs to be a priority with board-level attention and funding commensurate with the scale of the enterprise and data challenges. That will be a formula for success.

Seth Earley
Seth Earley
Seth Earley is the Founder & CEO of Earley Information Science and the author of the award winning book The AI-Powered Enterprise: Harness the Power of Ontologies to Make Your Business Smarter, Faster, and More Profitable. An expert with 20+ years experience in Knowledge Strategy, Data and Information Architecture, Search-based Applications and Information Findability solutions. He has worked with a diverse roster of Fortune 1000 companies helping them to achieve higher levels of operating performance.

Recent Posts

[Earley AI Podcast] Episode 26: Daniel Faggella

Human Cognitive Science Guest: Daniel Faggella

[RECORDED] Master Data Management & Personalization: Building the Data Infrastructure to Support Orchestration

The Increasing Criticality of MDM for Personalization for Customers and Employees Master data management seems to be one of those perennial, evergreen programs that organizations continue to struggle with. Every couple of years people say, “we're going to get a handle on our master data” and then spend hundreds of thousands to millions and tens of millions of dollars working toward a solution. The challenge is that many of these solutions are not really getting to the root cause of the problem.  They start with technology and begin by looking at specific data elements rather than looking at the business concepts that are important to the organization. MDM programs are also difficult to anchor on a specific business value proposition such as improving the top line. Many initiatives are so deep in the weeds and so far upstream that executives lose interest and they lose faith in the business value that the project promises. Meanwhile frustrated data analysts, data architects and technology organizations feel cut off at the knees because they can't get the funding, support and attention that they need to be successful. We've seen this time after time and until senior executives recognize the value and envision where the organization can go with control over its data across domains, this will continue to happen over and over again. Executives all nod their heads and say “Yes! Data is important, really important!” But when they see the price tag they say, “Whoa hold on there, it's not that important”. Well, actually, it is that important. We can't forget that under all of the systems, processes and shiny new technologies such as artificial intelligence and machine learning lies data. And that data is more important than the algorithm. If you have bad data your AI is not going to be able to fix it. Yes there are data remediation applications and there are mechanisms to harmonize or normalize certain data elements. But looking at this holistically requires human judgment: understanding business processes, understanding data flows, understanding dependencies and understanding of the entire customer experience ecosystem and the role of upstream tools, technologies and processes that enable that customer experience. Until we take that holistic approach and connect it to business value these things are not going to get the time, attention and resources that they need. In our next webinar on March 15th, we're going to take another look at helping organizations connect master data to the Holy Grail of personalized experience. This is an opportunity to bring your executives to a webinar that will show them how these dots are connected and how to achieve significant and measurable business value. We will show the connection between the data, the process that the data supports, business outcomes and the and the organizational strategy. We will show how each of the domains that need to be managed and organized to enable large scale orchestration of the customer and the employee experience. Please join us on March 15th and share with your colleagues - especially with your leadership. This is critically important to the future of the organization and getting on the right track has to begin today.

[Earley AI Podcast] Episode 25: Michelle Zhou

Data Tells the Story Guest: Michelle Zhou