Why Information Architecture Is the Foundation Every AI Initiative Requires

The enterprise AI market has no shortage of enthusiasm. Vendors across the spectrum, from early-stage startups to established software platforms, are competing to position their offerings as the next leap forward in intelligent automation. Financial services firms are deploying AI-driven advisory tools. Conversational interfaces are handling customer service interactions and supporting sales teams at scale. The momentum is real and the investment is substantial.

What tends to get far less attention in these conversations is what sits underneath all of it. Every AI application, regardless of the sophistication of its algorithms, depends on information that has been organized, structured, and curated to a standard that allows the system to do its job. Many vendors are reluctant to emphasize this dependency. Some claim their models operate directly on unstructured sources, interpreting user intent and generating responses without requiring any predefined architecture or human preparation work. That may hold in narrow, constrained scenarios. In practice, the vast majority of enterprise AI applications require a substantial human investment in knowledge engineering before the technology delivers reliable results.

The case of DigitalGenius, which attracted significant attention at an industry conference in 2015, illustrates this clearly. The platform uses deep learning and neural nets to handle customer interactions. But before any of that processing occurs, incoming queries are classified into categories: product information requests, account inquiries, action requests, comparison questions, recommendation questions, and so on. That classification step is information architecture. It is the foundation on which all subsequent processing depends. From there, the system routes queries to product information systems, external databases, and APIs, each of which must itself be well-organized to return a useful response. If the underlying data is not structured and maintained at an adequate quality level, the AI layer has nothing meaningful to surface. The intelligence is only as capable as the information it can access.

Delivering the Right Content to the Right Customer at the Right Moment

Organizations today are continuously investing in ways to improve how they engage customers through digital channels. The goals are consistent across industries: more relevant content presentation, more precise search results, more effective promotional offers, better self-service capabilities, and stronger overall product experiences across every touchpoint.

What all of these objectives share is a common dependency on data. Every personalized recommendation, every refined search result, and every contextually timed offer requires the system to interpret signals that customers generate through their interactions with the organization. Those signals include purchase history, real-time browsing behavior, support contacts, content consumption, stated preferences, demographic and firmographic data, and a wide range of behavioral indicators captured through marketing automation platforms.

A search query is, at its core, a recommendation request. The phrase a user types is the signal, and the result set is the recommendation. The more the system knows about who is asking and what they actually need, the better it can tailor the response. For any query related to products, that capability depends entirely on whether the product data underlying the system is clean, complete, and properly structured.

Building the Product and Content Relationships That Enable Personalization

Personalizing the customer experience at scale is not simply a matter of deploying the right algorithm. It requires that product data be accurately organized, that content workflows be integrated with product onboarding processes, and that the system can draw meaningful connections among products, content types, and the signals that indicate user intent.

Those connections are grounded in an understanding of what the user is trying to accomplish. A customer researching a potential purchase may need technical specifications, how-to guides, reference diagrams, comparison data, or usage instructions. The relationship between that content and the relevant products is not self-evident to a machine. It has to be modeled explicitly, based on knowledge of the task the user is engaged in and what information would help them complete it.

AI, broadly speaking, is a class of technologies designed to handle the kinds of reasoning and pattern recognition that have historically required human cognition. Every AI program processes information, and the better that information has been structured in advance, the more accurately and reliably the program performs. The organized body of content that a system draws on is called a corpus. The practice of structuring that corpus for retrieval and reasoning is called knowledge engineering, and the resulting structures are called knowledge representations.

Ontologies: The Architecture That Enables Reasoning

Knowledge representations encompass taxonomies, controlled vocabularies, thesaurus structures, and the full network of relationships among terms and concepts. Taken together, these elements constitute an ontology. An ontology models a domain of knowledge and defines how information within that domain can be accessed and interpreted in specific contexts.

Ontologies can also capture practical, logical knowledge about objects, processes, materials, actions, and events, along with the relationships among them. This allows a system to reason beyond the explicit contents of its corpus. If a user's question does not map directly to a stored answer, the system can infer a response from the facts and relationships encoded in the ontology. The practical result is a system that handles variation in how users phrase requests and manages use cases that were not fully anticipated during development. The system can reason, in a structured and bounded sense, rather than simply match strings.

Producing this kind of experience requires that customer data be accurate, properly structured, and integrated across systems and processes. It also requires the system to understand the relationships among users, the tasks they are performing, the products involved, and the content those tasks require, all assembled dynamically in real time. Harmonizing these structures across back-end platforms and front-end interfaces produces what can be called an enterprise ontology. It is the architecture that makes a consistent, personalized experience across channels operationally possible.

Using Content to Surface Product Relationships

The connection between content and products is bidirectional. Products need to be associated with relevant content and user context, but content can also be analyzed to identify what products a given situation requires. In an industrial maintenance scenario, for example, a user working on a hydraulic system needs specific parts and tools. Adaptive pattern-recognition software applied to technical reference manuals can extract the relevant components and correlate them with a company's product catalog. A search for hydraulic repair information can then return a dynamically assembled product page, built from the relationships encoded in the underlying architecture, rather than a static result.

This kind of capability sounds complex, and in some respects it is. But the building blocks for it are becoming more accessible as the tooling around knowledge engineering and ontology management continues to mature.

When You Understand It, It Stops Being AI

The definition of what counts as artificial intelligence has always shifted as the technology has advanced. A colleague once noted that something qualifies as artificial intelligence right up until the point that you understand how it works. That observation has a long history of bearing out.

An MIT AI course captured the same idea in more formal terms: technologies that were once treated as genuine AI, including compilers and speech recognition, eventually became well enough understood that they were reclassified as standard engineering. By definition, once a technology works reliably and predictably, it tends to graduate out of the AI category. Autonomous vehicles were once considered technically infeasible because of the volume of real-time data they would require. Reliable speech recognition demanded extensive speaker-specific training. The word processor is, in a historical sense, an AI application that has been so thoroughly normalized that the classification would seem absurd today.

This pattern matters because it is easy to treat current AI capabilities as somehow categorically different from the engineering disciplines that support them. They are not. AI is applied engineering, and it rests on foundations that require the same kind of careful, disciplined work as any other complex system.

Complexity in Service of Simplicity

From a user's perspective, well-implemented AI feels effortless. A conversational interface answers a question naturally. A recommendation surfaces the right product without visible effort. A search returns results that match what the user actually needed. The experience of simplicity is real. What is hidden is the engineering required to produce it.

That engineering requires foundational structures that can be reused across processes, departments, and applications. Those structures typically begin as siloed, standalone implementations. Their full value is only realized when brought together within a holistic framework of machine-intelligence-enabled infrastructure. AI will reshape the business landscape, but doing so requires investment in product and content architecture, customer data, analytics, and the harmonization of tools across the customer engagement ecosystem.

Structured Data Is the Price of Admission

AI is frequently proposed as the solution to enterprise challenges around information overload and customer engagement. But before those capabilities can be leveraged, organizations need the input data that machine learning algorithms require. Clean, well-structured, managed data is not a nice-to-have. It is a precondition.

Much of the data that AI systems process tends to be less structured than financial or transactional records. Learning algorithms can extract meaning from ambiguous queries and make sense of unstructured inputs to a degree. Users phrase questions inconsistently, ask broad questions, and do not always have a clear picture of what they are looking for. This is precisely why skilled salespeople engage prospects in conversations about overall needs rather than simply asking what they want. AI inserted into that process is most effective when users can articulate what they need and when a reasonably clear answer exists. The algorithm handles variations in phrasing, interprets intent, and processes contextual signals. But even when AI systems are applied to entirely unstructured information, structure is still required at the data layer.

A common misconception about large, schema-less data sources is that because no predefined structure exists, none is needed. In reality, data still requires attribute definitions, normalization, and cleansing before machine learning and pattern-identification algorithms can operate on it reliably. As organizations pursue AI and machine learning capabilities, the foundational priority should be developing an enterprise ontology that represents all of the knowledge any deployed AI system would need to process, analyze, or act on.

Some vendors will argue that their algorithms can handle whatever inputs they receive. In practice, this holds only when ontologies are self-contained within the tool itself, and even then, gaps will exist between what a broadly designed tool can contain and the specialized vocabulary and contextual knowledge requirements of a specific enterprise. Closing those gaps is significant work. Skipping it means missing an essential step.

Much of what is marketed as AI represents an extension of well-established approaches to information management challenges, all of which require clean, foundational data structures as a starting point. The distinction between conventional information management and practical AI lies in understanding where these technologies add genuine value and where the limits of current capabilities fall.

Identifying the Right Use Cases for AI

Distinguishing AI use cases from standard information management problems requires examining the data sources involved, the nature of the task the user faces, and the systems that will be part of the solution. An AI approach demands a greater level of investment, executive-level sponsorship, program-level governance, and enterprise-wide influence than a typical information management initiative. It also requires a longer time horizon.

While there are opportunities to deploy AI in limited, focused applications, treating it as a transformative class of technology means incorporating it into an overall digital transformation strategy. In some organizations, the scale of commitment required is comparable to an enterprise resource planning implementation, with corresponding levels of funding and leadership support. No organization will make that commitment to an unproven set of technologies outright, but funding needs to be allocated deliberately to extend proven approaches with emerging AI capabilities.

The roadmap for AI transformation involves continuous evaluation of payback and return on investment, with a focus on near-term wins pursued in parallel with longer-term objectives. Most organizations currently attempt to address AI-relevant problems with departmental-level solutions, standalone tools, and insufficient funding. Progress is possible within those constraints, but it represents an extension of business as usual rather than transformation. Genuinely transformative applications require an enterprise view of the organization's knowledge landscape and the implementation of new governance structures, performance metrics, and data quality programs. Governance enables decision-making. Metrics monitor the effectiveness of those decisions. Data quality fuels the AI engine.

The table below presents example applications for AI technology.

Identifying Data Sources That Support AI

Training data can come from a wide range of organizational sources, with more highly curated sources generally producing better results. Call center recordings and chat logs can be mined for content relationships and answers to common questions. Streaming sensor data can be correlated with historical maintenance records. Search logs can be analyzed for recurring use cases and user problems. Customer account data and purchase history can be processed to identify buyer similarities and predict responses to offers. Email response metrics combined with offer content can surface buyer segments. Product catalogs and data sheets provide attributes and attribute values. Public reference materials can yield procedures, tool lists, and product associations. Video content audio tracks can be transcribed and mined for product relationships. User website behaviors can be correlated with offers and dynamic content. Sentiment analysis, user-generated content, social graph data, and other external sources can all be combined to yield knowledge and user-intent signals. The appropriate data sources will vary by application, use case, and objective.

The table below describes examples of AI tools with representative applications, limitations, considerations, and data sources. It is not an exhaustive inventory, and individual tool categories frequently overlap in practice. The intent is to articulate the key considerations when evaluating one approach against another.

Governance, Curation, and Scalable Processes

AI and cognitive computing programs are governed in much the same way as other information and technology initiatives. They require executive sponsorship, defined charters, clear roles and responsibilities, decision-making protocols, escalation processes, and explicit linkage to business objectives and operational processes. These initiatives sit within the broader context of digital transformation and connect directly to customer life cycles and internal value chains.

Because the goal is always to influence a process outcome, all AI and cognitive computing programs should be tied to metrics at multiple levels of detail, from content and data quality through process effectiveness and satisfaction of business imperatives, and ultimately to the organization's competitive and market strategy. Program milestones and funding stages should be defined with clear success criteria and measurable outcomes at each phase.

AI will continue to affect every dimension of organizational and personal life, often in ways that are not immediately visible. Improved application usability, more precise information retrieval, and more capable virtual assistants will increasingly become the standard interface between people and technology. Humans generate knowledge. Machines process, store, and act on it. AI is, at its core, applied human knowledge. Organizations that invest in capturing and curating that knowledge, and in building the foundational data structures that give it form, will be positioned to advance their AI capabilities in a sustained and meaningful way. Without those components, the algorithms have nothing to run on.

References

J. Vögeli, "UBS Turns to Artificial Intelligence to Advise Clients," Bloomberg, 7 Dec. 2014
C. Green, "Is Artificial Intelligence the Future of Customer Service?" MyCustomer, 3 Dec. 2015
E. Dwoskin, "Can Artificial Intelligence Sell Shoes?" Wall Street Journal, 17 Nov. 2015
R. Miller, "DigitalGenius Brings Artificial Intelligence to Customer Service via SMS," TechCrunch, 5 May 2015
S. Earley, "Lessons from Alexa: Artificial Intelligence and Machine Learning Use Cases," Earley Information Science, 24 Mar. 2016
J. Brownlee, "How to Prepare Data for Machine Learning," Machine Learning Mastery, 25 Dec. 2013

This article was originally published in IT Pro, published by the IEEE Computer Society, and has been revised for Earley.com.