Earley AI Podcast - Episode 13: Taxonomy, Knowledge Graphs, and the Excuse Case with Stephanie Lemieux

Why Information Architecture Is Still the Foundation AI Cannot Replace - and How to Find the Right Entry Point

Guest: Stephanie Lemieux, President and Principal Consultant, Dovecot Studio

Hosts: Seth Earley, CEO at Earley Information Science

             Chris Featherstone, Sr. Director of AI/Data Product/Program Management at Salesforce 

Published on: March 31, 2022

 

 

 

In this episode, Seth Earley and Chris Featherstone speak with Stephanie Lemieux, President of Dovecot Studio and chair of the Taxonomy Boot Camp conference, who began her career in information architecture just as e-commerce was taking off - working at what was then Earley and Associates on projects for Best Buy, Ford Foundation, Motorola, and Hasbro. Stephanie discusses why organizations still struggle with the foundational problems of taxonomy and information architecture despite a decade of growing awareness, why throwing machine learning at messy data produces consistently challenged results, how knowledge graph projects routinely stall because organizations do not realize what infrastructure must exist before the technology can work, and how to find the right "excuse case" - a targeted, high-visibility problem like search that justifies the investment while quietly building the enterprise-wide architecture that everything else depends on.

 

Key Takeaways:

  • Organizations increasingly understand the value of information architecture and taxonomy, but machine learning has introduced a new wave of confusion - many believe AI can substitute for structured metadata and controlled vocabularies rather than depend on them as prerequisite infrastructure.
  • Knowledge graph proof-of-concept projects fail in two predictable ways: organizations cannot articulate what problem the graph should solve, or they hire a vendor and discover too late that data normalization, taxonomy, and ontology must be built before the technology can function.
  • The "excuse case" is a practical entry strategy - use a specific, high-visibility pain point like broken search to justify the project budget, but design the solution as enterprise-wide foundational architecture that will serve many downstream systems and use cases.
  • Taxonomy and information architecture are not one-time deliverables but living systems requiring ongoing care, training, retraining, and governance - organizations that treat them as point solutions consistently find that pilot-quality data preparation does not survive the transition to production.
  • User expectations have been permanently reset by Google and mobile devices - people no longer expect to manually configure interest profiles or navigate taxonomy trees, they expect systems to learn their behavior, which means customer data models must be as deliberately architected as content and product data models.
  • Specialized boutique consultancies and large global system integrators play complementary roles - smaller firms provide the deep metadata, usability, and knowledge modeling expertise that large firms often lack, and frequently lead the foundational architecture work that makes large-scale rollouts successful.
  • Entry paths into the taxonomy and information architecture field have expanded significantly through information science programs, digital transformation degrees, and self-directed learning via online certifications and open source tools - library science backgrounds provide a strong foundation but are no longer the only pathway.

 

Insightful Quotes:

"There's a confusion about what the scope of these new tools are and what kinds of use cases they can solve well - and that even if you do implement sophisticated AI tools, they still depend on a core information architecture behind them. You can't throw AI at millions of pieces of content and not have any structured metadata or controlled vocabulary for it to sink its teeth into." - Stephanie Lemieux

"The excuse case can be a point solution, but the solution itself has to be something more enterprise-wide and more foundational - a service layer or back-end architecture layer that spans your whole martech ecosystem, your whole product data lifecycle, your whole customer journey." - Stephanie Lemieux

"You can't automate a mess. You can't automate what you don't understand. People need to come back and say: what is that fundamental problem, and what is that fundamental process? The AI cannot do that itself." - Seth Earley

Tune in to hear Stephanie Lemieux explain why a gaming company client has been building its knowledge graph and ontology for five or six years and still considers it a work in progress, why the client she describes as a "random document generator" perfectly illustrates why everyone says search is broken, and why she started having airport dreams after two years of pandemic-era remote work - and what that says about where in-person collaboration still cannot be replaced.

 

Links:

Thanks to our sponsors:

 

Podcast Transcript: Taxonomy, Knowledge Graphs, the Excuse Case, and Why Information Architecture Is Still Non-Negotiable

Transcript introduction

This transcript captures a conversation between Seth Earley, Chris Featherstone, and Stephanie Lemieux about the nuts and bolts of taxonomy, information architecture, and knowledge graphs in 2022. Stephanie draws on nearly 15 years of practice - from early e-commerce projects with Seth to current enterprise taxonomy strategy and knowledge graph engagements - to explain why foundational data architecture remains the hardest and most important thing organizations consistently underinvest in, and why the rise of machine learning and AI has made getting that foundation right more urgent, not less.

Transcript

Seth Earley: Welcome to today's podcast. I'm Seth Earley.

Chris Featherstone: And I'm Chris Featherstone. Good to be with you. I know you're en route somewhere so I know it's a little bit challenging today.

Seth Earley: Before we get started I'd like to give a shout out to our sponsors - CMSWire, the Marketing AI Institute, and of course my organization, Earley Information Science. Our guest today is a woman after my own heart - she's an expert in taxonomy and unstructured content as President and Principal Consultant of Dovecot Studio. She consults on content modeling, information architecture, search, and more. In her spare time - other than wondering why she has no spare time - she serves as chair of the Taxonomy Boot Camp conference, held each November in Washington DC. Please welcome Stephanie Lemieux.

Chris Featherstone: Stephanie, since you've known Seth so long, I want to know some background dirt on him.

Seth Earley: We have a mutually assured destruction clause - she tells you dirt on me, I get to tell her dirt too. And there's no dirt on Stephanie.

Stephanie Lemieux: I met Seth pretty much the second I was done with my master's degree - even just before I was done. I was in library school trying to figure out what to do with my life, and that particular library school had just transitioned away from being pure library science to focus on this new-at-the-time concept of information architecture and taxonomy. I had kind of fallen into that and found it interesting, and I ran into Seth through a community of practice he was running called the Taxonomy Community of Practice. I reached out and said I was really interested in the content and wanted to write up summaries of key things coming out of the group. I guess that impressed him because he immediately offered that maybe I could help with his webinar, and that snowballed pretty quickly into us working together. I joined what was then called Earley and Associates right out of school - almost 15 years ago, if not more.

Seth Earley: Technology dog years. Stephanie worked on some great projects - Best Buy was one where we had incredible success. Ford Foundation, Motorola, Hasbro - there's a bunch. Very gratifying work.

Stephanie Lemieux: What I'm most grateful for is that you took me under your wing at the moment when e-commerce was really booming. What is e-commerce without taxonomy and information architecture? We would be in these big meetings with large retailers trying to convince their executives that you need to care about product categories, how you make data available on your site, and user terminology. It was a hard sell back then, but now it's a breeze - or at least, easier.

Seth Earley: I want to push back a little on that - it still depends on the organization, and it is still many times a difficult sell, believe it or not. We've had people in charge of e-commerce asking why they need information architecture. Astoundingly enough. But I still think companies nod their heads and understand it to some degree, yet many times they're not making the investments they should. There are still a lot of foundational problems and silos.

Stephanie Lemieux: Definitely, and the complexity question has changed. Back in the early days of e-commerce, you had to worry about the website and getting product data on it, but you didn't have nearly as many systems connecting as you do today. As things have migrated to omni-channel and you have all these data syndication services, lots of different manufacturers producing their own data - it is a much more complicated question than it was 15 years ago.

Seth Earley: That's a really good point. The martech ecosystem has gotten extremely complex. There are something like 15,000 tools out there at last count, and many organizations will have 100 to 150 technologies. They are very rarely in alignment, harmonized, or normalized. In order to get a 360-degree view of the customer, in order to have a seamless customer experience, you really do have to get all those things aligned. What have you been seeing in the marketplace in that regard?

Stephanie Lemieux: The understanding is definitely increasing - I'm having to do less and less ROI pitching about the underlying value of good data and information architecture. But what I am seeing that's shifting is machine learning and AI jumping into the fray and muddying the waters. Organizations are saying yes, we absolutely need information architecture, but let's let the machine do it. Rather than dedicating people, consulting budget, or system development budget to solve these problems, they want to throw machine learning at it and see what comes out the other side.

Seth Earley: And how has that been working out for them?

Stephanie Lemieux: Really challenged.

Chris Featherstone: There's that fallacy that if I just throw machine learning at it, it will solve all of it. But people don't understand that in a training scenario, you still have to get the data into some common canonical or normalized format for the machine to even know what to look for. And we have so much unstructured data now, so much rich media that needs to be turned into structured data.

Stephanie Lemieux: Clients are getting more savvy about the question of integration and data orchestration, but there's still a lot of demystifying and education required upfront. I'm right in the middle of a project right now - enterprise taxonomy strategy for a content-heavy organization that also has data as product. Practically every person I'm interviewing, whether they're talking about content publishing, digital asset management, or ad targeting, is convinced about the value of taxonomy and information architecture. But in the same breath they'll say they also want to get some machine learning in there to be more agile. There's a confusion about what the scope of these new tools are, what use cases they solve well, and that even if you do implement sophisticated tools, they still depend on a core information architecture. You can't throw AI at millions of pieces of content without having structured metadata or a controlled vocabulary for it to sink its teeth into.

Seth Earley: Right. We need to tell the AI what's important - our product names, our service names, our processes, our content, our regions, our customer attributes. Taxonomy is foundational to that reference architecture. When we build multiple taxonomies and then create the relationships between them, it really is an ontology. Those associative relationships become part of the ontology. And that area is getting deep technical attention at some levels, but people are still grappling with the use cases. What have you been seeing with graph data and knowledge graphs?

Stephanie Lemieux: Knowledge graph is definitely the most popular buzzword in the information management space, at least in my world. And everyone wants to do a proof of concept. The two key things I'm seeing: they want to do a proof of concept but they have no idea what the proof of concept should do. They know they want to get into knowledge graphs but don't know what the graph can do, what they should point it at, or what problem they're really trying to solve. The other piece is that if they do have a clear use case and the budget to move forward, they hire a vendor and then discover they have all this infrastructure they have to build behind it for it to be successful. Data normalization, taxonomy, ontology building, a lot of framework building - you can't just point the technology at something and have magic happen.

Seth Earley: What's interesting is that many things we used to call faceted search are now showing up as knowledge graph applications. Things we did but didn't call a knowledge graph - we called them information architecture, ontology, content models. The use cases are the same: semantic search, graph search, powering virtual assistants. But people still sometimes think the vendor tool does it all. I keep going back to the same point: you can't automate a mess, you can't automate what you don't understand.

Stephanie Lemieux: One thing I find nice about the focus on the shiny new toy of graphs and AI is that it is bringing light back onto the metadata that we've been accumulating over many years. You can't have messy data underneath a knowledge graph. I'm doing a knowledge graph proof of concept right now where the idea was to connect staff data from Active Directory with content in a document management system and build an expertise locator. Except that half the data in Active Directory is incomplete or incorrect, and people have not wanted to tag in the document management system for years. The knowledge graph is not going to fill in or correct that data. We've had to do normalization, bulk tagging - there's no way around dealing with that metadata debt at some point.

Chris Featherstone: When you go into these organizations, which roles do you find accelerate these projects, which are the naysayers, which need to be educated?

Stephanie Lemieux: Interestingly, a lot of the interest in knowledge graphs is coming from the business itself - sometimes from centralized knowledge management or information management shared service roles that are trying to enable digital transformation and have been mandated to solve these problems at scale rather than for a specific unit. What I'm less often seeing is where I would expect these interests to arise - enterprise architecture. Enterprise architects are definitely allies in these projects, but they don't often initiate them. It's more from the business side, in collaboration with folks in data governance or enterprise architecture.

Seth Earley: So when organizations think about knowledge graphs and pilots, where do you recommend they start?

Stephanie Lemieux: Most of the time I see two flavors of knowledge graph interest. One is from a knowledge management perspective - wanting to support knowledge workers doing more with large swaths of unstructured content, connecting it with data, building an expertise locator, creating new connections, reusing and gaining more value from existing content. The other is search enhancement - supporting more natural language questions and answers, providing more robust search results that span different collections and data sources. Those are probably the two biggest areas I see for proof of concepts.

Seth Earley: I'm also seeing data catalogs - understanding data ownership, rights to data, data quality, who touches it, the value chain. I agree knowledge management is a big one, and it does harken back to the earlier days of the field. We did faceted search, but now we have the ability to traverse the knowledge graph. The way I think of it: the ontology is the scaffolding of knowledge - the taxonomies, the relationships between them. When we build a knowledge graph using an ontology, we get access points and retrieval points for the data. That gives us flexibility to traverse from one system to another, one expert to another, from many different angles.

Stephanie Lemieux: And that's a continuation of the service-oriented architecture we were seeing so much of ten years ago. Data fabric and headless CMS are these same ideas rewrapped with different names - ways of aggregating content and applying information architecture to it from disparate sources to serve up data from a centralized service layer to support many applications.

Seth Earley: Headless CMS, for folks who may not be familiar, is separating content from the downstream application - the "create once, publish everywhere" model. One of our large customers with 100,000 employees and a lot of technical content uses that to serve up 4 million knowledge transactions per day. That content gets surfaced in self-service, call centers, field service, marketing, bots, virtual assistants, troubleshooting tools, and even embedded within equipment so the equipment can diagnose itself, open a trouble ticket, order the part it needs, and tell the service person how to fix it when they arrive. Organizations that are not paying attention to information architecture, not looking at headless CMS, not understanding their knowledge, not dealing with their silos - they are going to be caught flat-footed when virtual assistants become highly functional. The only way to get there is through the data.

Stephanie Lemieux: Absolutely. Some organizations are further ahead on this journey than others. Those that are just starting will be able to learn from the ones that have already been at it for a few years and fallen into all the traps. We have a client - a gaming company - working on graphs and ontologies. It's a really large and complex system. You have different studios developing games, each with their own silos; the gaming platforms themselves; product data; gamer data from interactions with the games; and super omni-channel content provision spanning their own platforms and content within the games themselves. Trying to connect all those data points, connect a gamer with the next best game or the next best quest - these are fascinating data and content problems. But this has been a multi-year journey, at least five or six years working on components of the overall architecture. It's still challenging. That's not something you spin up in six months.

Seth Earley: How do you ensure that an individual is getting the right information when there's that much content at that volume? It has to do with leveraging signals - what are your thoughts on solving that problem?

Stephanie Lemieux: Depending on the context, you have user information you can draw on - if they're logged in you know their profile, what products they own, what they've called about in the past, their behavior data, what they've been looking at recently. You can take cues and signals from activity, profile data, location data, and many other potential sources. But one thing that has changed in the last 15 years is user expectations. I no longer expect, as a user of any content platform, to manually tell you what I'm interested in. I have been trained very well by Google and my mobile device that it should already know what I like, what I read most often. My device has learned things about me through my interactions with it, and that trains me to expect the same from other vendors, brands, and work tools. We can't ask people to log in and choose from 17 interest categories out of a taxonomy of 500. That's not viable anymore.

Seth Earley: And that means we have to be as intentional about our customer data models as we are about our content models and product data models. The attributes should be orthogonal vocabularies describing different dimensions of that customer - I'm a CIO in the Midwest in the market for a router - those should be separate facets, not pre-coordinated. That's how we align customer attributes with product attributes and content attributes. It's interesting, and I think we're just starting to see organizations become aware of it.

Chris Featherstone: How do you help organizations understand that this is never done - that it's not a point solution but a full journey requiring ongoing care and feeding?

Stephanie Lemieux: I don't really offer point solutions - I do the infrastructure component. I get called in for something that looks like a point solution and it turns into: okay, you wanted a taxonomy or metadata schema for this DAM or this headless CMS or whatever, but what you really need is solid enterprise-level information architecture that spans your whole martech ecosystem, your whole product data lifecycle, your whole customer journey. The piece I'm working on most of the time is that foundational layer. The use case - the excuse to start these metadata and information architecture projects - can be a point solution, but the solution itself has to be something more enterprise-wide and foundational: a service layer or back-end architecture layer.

Seth Earley: Everybody has problems with search. Everybody says search is broken. That's one of those excuse cases, isn't it?

Stephanie Lemieux: Search is definitely the main use case I see as an entry point - whether that's external customer-facing search for content, data, or products, or internal knowledge worker search. That's been the primary excuse case for most of my clients.

Seth Earley: Is there much difference between the value a large global integrator versus a company like yours or mine brings to these projects?

Stephanie Lemieux: I'm sure my answer is extremely biased. But more niche and focused consulting firms like yours and mine put a lot of attention on knowledge modeling, understanding expertise, understanding knowledge workers and their needs, user experience, and how people think and work. What I appreciate about larger organizations like IBM and Deloitte is very strong technical expertise and the sheer person-power they can bring to large deployments. I often work in tandem with those organizations on larger projects - we zoom in on metadata, taxonomy, and usability problems that feed into their larger technical solutions. Often we come in ahead of those large projects because an organization has correctly pinpointed that they should get their metadata act together before starting a big global DAM rollout. We do the focused foundational work and then hand it into the larger technical project.

Chris Featherstone: What would you recommend to someone who wants to get into this field? And what's your perspective on women in tech specifically?

Stephanie Lemieux: I have an interesting path into this because I came from library school, which was a very female-dominated space 15 years ago. As it was transitioning to information science those schools had to do a lot of introspection about the role of the information specialist in the new era. Those schools have modernized significantly, and there are now new types of programs - information science focused, digital transformation programs, asset management programs, disciplines that didn't exist 15 years ago. If you're not interested in going back to school, there are so many self-learning opportunities. Seth's book is a great way of teaching yourself the strategy for a more senior level. If you're interested in knowledge modeling or RDF and the more technical capabilities, there are tons of certifications and training courses available online, and a lot of open source technology you can work with on the side to build your capabilities.

Seth Earley: What are you looking forward to in 2022?

Stephanie Lemieux: I actually miss airports. I haven't been in one since the very beginning of 2020. I have airport dreams. But the pandemic has changed my relationship with my work in some good ways - I've had a number of really long-term projects where I've been able to feel almost like part of the staff, which has been really nice. I hope to continue some of that vibe even as things open back up. I've definitely learned to appreciate some of the changes to how work has been conceptualized.

Seth Earley: We've really come to the end of our time. It's been a real pleasure talking with you today, Stephanie. I miss our work together and I have such fond memories of having you as part of the Earley team. Before we close, Stephanie, anything coming up you want to share?

Stephanie Lemieux: For those in this space - the Taxonomy Boot Camp call for proposals has just gone up. We are hoping to have an in-person conference this year in November in DC. If you're interested in taxonomies and graphs, we do a ton of that content. Hope to see people there.

Seth Earley: We'll look forward to it. Great to catch up. We'll have to do it more often.

Chris Featherstone: Thank you for all the great wisdom. Appreciate it - have a great day.

Stephanie Lemieux: Have a great day. Bye!

Meet the Author
Earley Information Science Team

We're passionate about managing data, content, and organizational knowledge. For 25 years, we've supported business outcomes by making information findable, usable, and valuable.