Earley AI Podcast - Episode 3: Advancements in Enterprise Search with Massood Zarrabian

Why Enterprise Search Can Finally Deliver a Google-Like Experience - at a Fraction of the Former Cost

Guest: Massood Zarrabian, CEO, BA Insight

Hosts: Seth Earley, CEO at Earley Information Science

             Chris Featherstone, Sr. Director of AI/Data Product/Program Management at Salesforce 

Published on: October 4, 2021

 

 

 

 

In this episode, Seth Earley and Chris Featherstone speak with Massood Zarrabian, CEO of BA Insight - winner of the KMWorld 2021 Readers' Choice Award for Best Enterprise Search. Massood's path to this role is one of the more unusual in enterprise software: he came to MIT from Iran on a civil engineering scholarship, intending to return and help build infrastructure for a developing country, stayed in the US after falling in love with the freedoms he found there, pivoted to business school when the civil engineering job market collapsed, landed a co-op at a computer vision company, and eventually built a career at the intersection of information retrieval and AI. In this conversation he explains why the Google-like enterprise search experience that used to require $20 million and a team of engineers can now be delivered for hundreds of thousands of dollars, how organizations should think about indexing strategy, why tagging remains a human problem that AI can only partly solve, and why total cost of ownership - not implementation cost - is the question organizations need to be asking.

 

Key Takeaways:

  • The cognitive services that used to give Google an insurmountable advantage over enterprise search - NLP, content processing, entity extraction, intent identification - have become increasingly commoditized and are now accessible to any organization, making a Google-like search experience genuinely achievable at dramatically lower cost than a decade ago.
  • Unstructured data created by knowledge workers is simultaneously the most expensive information any organization produces and the least accessible - organizations routinely spend $250 million on ERP implementations and then balk at $1.5 million to make their knowledge and content findable, a disproportion that represents a fundamental misunderstanding of where organizational value actually lives.
  • The central index is the key architectural shift that makes enterprise search tractable: rather than sending experts system-by-system to understand what each repository contains, you connect all systems to a unified index that lets a single expert analyze and improve everything from one place - the same logic that made data warehouses transformative for structured data.
  • A hybrid indexing strategy is necessary in practice because not all systems can participate equally: modern systems like Elasticsearch and SharePoint can federate indexes directly, while legacy systems require a centralized index; the user experience should be identical regardless of which approach is used behind the scenes.
  • Tagging will always require human intervention because value is entirely in the eye of the beholder - the administrator's judgment about which content is "high value" is irrelevant to the third-shift worker who needs a specific technical document right now and will waste four hours without it.
  • The reason enterprise search developed a bad reputation has almost nothing to do with technology and almost everything to do with total cost of ownership - implementations that require six people to maintain what Salesforce runs on half a person are simply unsustainable regardless of how good the underlying search quality is.
  • Organizations should start small and grow: a departmental enterprise search solution with a handful of connectors can now be stood up for $50,000 to $75,000, experienced with low risk, and expanded if it delivers value - the era of mandatory multi-million-dollar platform commitments before any results are visible is over.

 

Insightful Quotes:

"If you're sitting here in 1990 and talking about the same thing, people who had spent $250 million on an ERP system would be trying to understand why they needed a data warehouse - but that started in the 70s and it took 20 to 30 years to get there. The problem with enterprise search is patience: getting people to understand what it can do. We now have people who come in having already figured a bunch of things out and who are trying to find somebody who can play both the role of advisor and technology partner to take them to the next level." - Massood Zarrabian

"The Google search experience is getting to the place where it is very, very possible to do at much lower cost. Things that used to have to be hand-crafted, hand-coded, hand-integrated - all of those UI functions and classification structures and semantic structures that you had to build by hand are now either available as a platform service, or built into the infrastructure and tools that are available today." - Massood Zarrabian

"Stop thinking in terms of million-dollar, five-million-dollar platforms with five people building something custom for you. That is the old approach. You could start as little as $50,000 to $75,000 a year and have a very good solution for a department, experience it, and if you like it, you grow. The world has changed." - Massood Zarrabian

Tune in to hear Massood Zarrabian describe how Seth's team audited content quality by putting "aardvark" as the first option in a tagging dropdown and then counting how many documents were tagged with it, how BA Insight tracks not just every click but what users do after the click, why he built a user-facing complaint button that sends administrators the exact URL of search results that didn't make sense, and why he believes the perceived degree of difficulty in enterprise search - rather than the actual technology - is the single biggest threat to the market's future.

 

Contact Massood:
mzarrabian@bainsight.com

https://www.linkedin.com/in/massoodzarrabian/


Links:
BA Insight website
The AI Powered Enterprise

Thanks to our sponsors:
Earley Information Science
CMSWire
Marketing AI Institute

Podcast Transcript: Advancements in Enterprise Search - Making Google-Like Search a Reality for the Enterprise

Transcript introduction

This transcript captures a conversation between Seth Earley, Chris Featherstone, and Massood Zarrabian about why enterprise search has historically underdelivered, how the technology and economics have shifted to make a Google-like experience genuinely achievable, and what organizations need to do - architecturally, operationally, and financially - to make it work and sustain it over time.

Transcript

Seth Earley: Welcome to our podcast today. Our guest is someone who has spent his career helping to improve how organizations and individuals retrieve information, while finding joy in positively impacting both the bottom lines of organizations and human lives. He's currently the CEO of BA Insight, an innovator in the AI-driven enterprise search market. Please welcome information retrieval pioneer Massood Zarrabian.

Massood Zarrabian: Hi everybody. Thank you very much for your time, and thank you to the audience that's going to be listening. Terrific.

Seth Earley: And of course I'm here with my co-host Chris Featherstone.

Chris Featherstone: Massood, it's good to have you. We're excited to have you on the podcast today. You have a very interesting background - you went to MIT, and you have a degree in civil engineering. I'd love to understand your original motivation. What did you believe you were going to change in the world?

Massood Zarrabian: My motivation to go to MIT was to become an expert and go back to Iran, which was at the time a developing country, and change the world from there. That was the motivation for coming here, studying here, becoming a civil engineer - we had a lot of things to build in that country. But when I came here I stayed. I went back a couple of times, but somehow this country bit me. I liked the freedom. I liked all the things that a human being has here that I didn't have there. So once I graduated I changed my mind and decided to stay. Unfortunately, at the time the civil engineering market was really bad. My journey changed - I went to business school, found a co-op job at a company in computer vision, got into software, and the rest is history.

Chris Featherstone: You can take the engineering out of the person but you can't take the engineering mindset out. So what's your key motivation now?

Massood Zarrabian: I don't think it's changed through my career. I like helping people grow. I'm lucky enough to have gone from company to company and had people decide to join me at the second company - proud of that loyalty. I'm proud of having people who grew and went on to become CIOs and CTOs even though we've never worked together again. I'm proud of making companies grow. I always advocate that it's actually not people first - it's company first. Because if a company doesn't do well, people will not survive. If a company is not profitable, none of us will have jobs. That motivation is balanced with a deep belief that technology can do a lot of things to improve lives and simplify things. The vision is the end result. The Google search bar changed everything. The result is even better than the technology behind it.

Seth Earley: You have an interest in theater and film and humanities - how does that manifest, and how does it influence what you're doing in this space?

Massood Zarrabian: When you go to MIT you have to take humanities courses, and I actually got to like them - took more than most people. My interest in movies grew from there. I especially like old movies. Casablanca is my real top pick - Humphrey Bogart, and partly because my wife's name is Elsa. But my interest in movies grew over time because I started seeing a correlation between how good movies are built and how good companies are built. Movies have great directors and producers. Movies have different actors in different roles. Companies are like that - they have directors and producers, CMOs and CEOs. A great movie or play is a combination of how those people collaborate, produce, and deliver. And I think a great company is the same thing: it's how that group gets it together.

Seth Earley: What's interesting about a movie is you get all these people together who have often never worked together, and then they have to produce a zero-defect product by the end of the shoot.

Massood Zarrabian: Exactly. And then you have critics who have never made a movie telling you what's wrong with yours.

Seth Earley: You mentioned Google, and that's what we really want to talk about today. When people ask why enterprise search can't be like Google, what's your response?

Massood Zarrabian: I think there are a bunch of foundational issues. Growing up in the enterprise surrounded by search, we started with FAST and SharePoint 2010 and 2013, and trying to make those things Google-like was really hard. That caused people to think enterprise search becoming like Google was fundamentally impossible. I agree it was really hard at the beginning of the last decade. But things changed. Things became easier. People are still thinking it's as hard as it was, but doing things around content services has become much easier. Natural language processing has become much less expensive. All the cognitive services that everybody has are getting more and more commoditized. Those are the things Google used to have that were not available to the enterprise - and they are now.

Technology can now do things that used to take really knowledgeable large teams. I'm not saying expertise isn't needed - the expertise has to be even higher now - but the number of people involved is much smaller. The strategy becomes easier because you don't have all those impediments.

Seth Earley: Jeff Reed, one of your prior CTOs, said to me: you could do a lot of this stuff five years ago, but it would cost you $20 million. What's changed is that the things that used to have to be hand-crafted and hand-coded and hand-integrated are now either available as platform services or built into the infrastructure and tools available today. Can you speak more to that?

Massood Zarrabian: Exactly. In the old days, if you had a bunch of systems with content where you don't have knowledge about what that content is - because it's all legacy and you let users tag it and they tagged it the wrong way - that's what Seth calls dark data. You don't know what's there. So what do you do? Hire 100 people and have them open each file one by one? That's the world we used to live in.

What's changed is that a lot of that can be automated. This doesn't mean you don't need some expertise to review and approve. But the work that was mechanical, tedious, and repetitive can now be done automatically. You get quickly to 80% of the way there, and then you improve it. And once you've done it, it's continuous - you don't have to start over every year. You do some work every year because the world is changing, but what you built stays in place. And there are learning models getting created so the system learns from end-user behavior and clicks. It's becoming much more viable to do a Google-kind of experience in an enterprise.

Chris Featherstone: One thing to highlight: we're really talking about unstructured data here. Files and formats that are locked down, information that somebody may have on a desktop that's been pushed to a central repository. It's one thing to have a data structure that's well understood and normalized. It's quite another to take a repository of unstructured data and say: make it searchable.

Massood Zarrabian: That observation is exactly right. Structured is much easier to address. Unstructured becomes more difficult, and as time has gone by we've gone from just having Word documents and PDFs to having Jira this and GitHub that and Confluence something else. Someone told me this morning that they tried to force people to discipline themselves to put content in the right systems - they actually have people monitoring it - and they said it's impossible. You are nuts if you think you're going to get users to put content in the right place.

Chris Featherstone: The content created by knowledge workers is probably the most expensive data any organization has, and yet the least available to the organization. That defines the crux of what we're talking about.

Seth Earley: I remember working with a large global manufacturer in aerospace. They spent $250 million on their ERP implementation. Then when it came time to look at their content and knowledge, there was a project of about $1.5 million and they said: whoa, that's way too much money. You spend $250 million there and you want to spend less than half a percent of that on the knowledge and content? Organizations that have a history of treating data - transactions and accounting - as important would never find someone in the accounting organization saying "it's just too much work to have those numbers add up." You wouldn't find that. Because people know how valuable it is. The problem with unstructured information is a lack of true understanding of the core value within knowledge and content.

Massood Zarrabian: Let me come back to this in terms of time. If we were sitting here in 1990 talking about the same thing, someone who just spent $250 million on an ERP system - you would be trying to convince them that a data warehouse was a good idea. But that started in the 70s and it took 20 to 30 years to get there.

The problem with enterprise search - and it's part of something much bigger, which is self-service - is that it started poorly. I joined a company in 1999 and we had one of the first self-service knowledge management systems for support. We used a search engine called Verity. We were trying to sell people on the idea: you don't need 200 people on the call - you can actually do self-service. Fifteen years later, that has become acceptable for customer self-service. But for internal enterprise search, the beginning of our market was in 2010 with portals, and portals were solutions that kind of failed. From there we went to SharePoint and FAST. From there, a system that was the next level. Part of our issue is the patience required to get people to understand what we do.

Now I'm very optimistic. The pandemic actually exacerbated the problem in a useful way - a lot of people realized what a Google-like search bar can do. Things are starting to mature. The next five years we're going to go away from spending so much time convincing people this is the right solution, toward one that is much more broadly accepted.

Chris Featherstone: I'd love to get your take on the evolution of the persona of who's actually getting this information. Before, you needed an army of developers with JavaScript skills. The business doesn't care how the sausage is made. They just need access.

Massood Zarrabian: When I joined BA Insight, what amazed me was how many people with coding experience we needed just to configure and make our software work. Someone would install our software, you'd need someone who could do JavaScript, and they'd spend weeks with that customer - and you'd end up with what people called a software solution that was really 20 to 30 to 40 percent custom code. Like a building where every pipe had to be customized.

We had an idea: let's not try to solve all the world's problems - solve 90% of what really matters. And it's okay if you can't do things the one way we figured out. We started building components we call modules that can play together, and we said let's not force a single big platform. Our solution now supports different search engines, different cognitive services from different vendors. We have 90 connectors - it's plug and play. And we've created a back end that a business analyst can configure, as opposed to an IT person who knows Python. An IT person who knows Python is rare and has a lot more important things to do.

The strategy should be for anybody doing any kind of enterprise search: think about software, but software with advisors who are smart and can help you. Don't think about it as engineers who code and customize a solution for you, because that's going to eventually backfire. It becomes too costly and unsustainable.

Seth Earley: What is the last mile? We have the infrastructure, we have tools that can process content and make sense of user intent - where's the value-add that still needs to happen on an organization-by-organization basis?

Massood Zarrabian: One critical piece is figuring out that whatever solution you have can evolve as the world evolves. If every time something changes you have to lift and shift, you're going to be in big pain. I look back at what happened when Google decided to get rid of the Google Search Appliance. I cannot imagine what dependent customers went through to move away from that.

The use-case view is critical: understand your content, tag your content, make it findable. On one side have experts who can advise you; on the other side understand the technology can move faster than you, so keep an eye on it and take advantage of it.

The barrier for organizations with thousands of pieces of information in thousands of different systems is that they think they have to go to each system and figure it out system by system. The metaphor I use: imagine no internet and you want to buy something but you don't know what's in any store. What are you going to do - go to the shopping mall and go to every store one by one? The answer is: connect your systems to a central index, and use the index to figure it out. Then the expert has one place to go with information from all the systems. You don't need a separate expert for each system - you centralize. It's like what we did years ago with data warehouses. We don't go to the source; we put everything in one place and figure out what to do with it there.

Seth Earley: The content and data is still decentralized, still in different systems. But the index becomes the piece that knows where the bodies are buried.

Massood Zarrabian: Exactly. Because if you do that, the experts have a consolidation of a bunch of systems they don't have to monitor anymore. They're not going system by system. They now have something that really uses their expertise - not figuring out what you've got, but knowing what you've got and telling you what you need to do next.

Chris Featherstone: Let's pivot to building, maintaining, and training indexes. It can make or break any search environment. And it's not just one index - it can be multiples depending on the outcome, whether it's departmental or thematic.

Massood Zarrabian: A single index for everything - imagine the size of it, and imagine keeping up with the rate of change. The web is built on what we eventually figured out after the first attempts in the early 2000s where we tried one enterprise index and it was really bad. The reason it was bad is that on the web, everybody building systems does it in a way where you can federate. But if you go back and try to federate Lotus Notes in the same way, it's not going to work - when Lotus Notes was created, that idea didn't exist.

Our view is that a singular index long term is not going to work because the index becomes too big and the rate of change is too fast - when I search, I'm not going to get fresh results. You have to distinguish between modern systems that let you do what the web does - multiple search engines, multiple indexes, like Amazon and Expedia using different ones but delivering unified results - and legacy systems that cannot. A single index will work for things that can't participate in federation; the distributed approach will work for things that can.

We have a hybrid approach: take advantage of both. It reduces cost, decreases time to freshness, eliminates latency. If I'm searching SharePoint and Elasticsearch at the same time, they each perform well. But if I try to do the same thing through a slow legacy system, I slow everything down. So we do both in parallel and make sure the shared experience for the user is identical regardless of which path the query takes.

Seth Earley: Retaining context of the source is important too. The CRM might use slightly different organizing principles than another repository. A single unified pane of glass, but with understanding of those different information sources preserved.

Massood Zarrabian: That's a great point, because if you think about natural language queries and interpreting intent - when someone asks an HR question, we already know HR is sitting in SharePoint. When someone asks about a case, we already know cases are in Salesforce. So we can search specifically for that person's question in the right system, from the right source, and put the right answers on top. Combining source awareness with NLP can get you the right answer 90 to 95 percent of the time.

We've also integrated completely with Amazon Kendra. For things that fit a question-and-answer model, Kendra is excellent - let it crawl, index, and when you ask a question, the answers extracted are more than 90 percent accurate with no human intervention. Not everything fits a Q&A model, but for the things that do, Kendra combined with BA Insight's broader index means users get a seamless experience regardless of what type of content or query they have.

Seth Earley: A lot of this you can plug in and get some value - indexing, clustering, entity extraction. But much of what we're trying to do with machine learning and AI is make up for past sins: bad content management, poor tagging, inconsistent architectures. What is the role of a reference architecture and information architecture going forward?

Massood Zarrabian: The problem we've had with tagging is not going to go away - it's going to get worse. We have had customers who tried to get users to tag things, and when they looked at the results, everybody had tagged things with the first or second item in the dropdown. So the user-tagging approach simply doesn't work. But the flip side - blindly using AI to help you and then five years from now turning it on and seeing what you've got - is also not the answer.

To me, the value of information architecture is helping combine the knowledge and expertise you have with what you need to do, so that what you come up with is a sustainable, solid plan to go forward. And once you've done that investment, my advice is: every year, bring people back to review what's going on, review how to improve it, and continue incremental improvement. That's the model. Combining the experience of your team with six months to a year of analytics, you're going to have the best of both. That's what Google does. Google has a lot of people doing this, but they also have billions of searches to learn from. You want to build toward the same continuous feedback loop.

Seth Earley: We actually tested tagging quality at one organization by putting "aardvark" as the first value in a tagging dropdown - something completely ridiculous and obviously inappropriate - and then checked how many documents had been tagged with it. There were a lot.

Massood Zarrabian: Very good. Coming back to the question about high-value content requiring more human intervention for tagging - I actually disagree. To me, there is no administrator-defined high-value content. If I'm on third shift and I'm trying to find something and some administrator has decided it's not high-value, and I can't find it, I'm going to waste four hours. So I think tagging always needs some human intervention - but not because administrators get to decide what matters. Because value is in the eye of the beholder.

Chris Featherstone: The beautiful thing about AI and machine learning is that it breaks the predefined norms we've lived in so long. It gives us the ability to say: let's take into consideration more data elements for a more holistic search - sentiment analysis, historical context - rather than just throwing up a result set. Based on my role, my persona, my attribution, give me a more contextual answer.

Seth Earley: What else do organizations need to have in place to make these things work effectively at an enterprise level? No magic pixie dust, AI is not a silver bullet, we still need governance and metrics and continuous improvement.

Massood Zarrabian: My biggest worry about the future of our market is the perceived degree of difficulty. What I worry about is that three years from now, people will see enterprise search as something that needs six full-time people to manage - when by comparison, Salesforce with 1000 salespeople runs on half a person. You need to think long term. You need to look at total cost of ownership. Because if the total cost of ownership is too high, it will fail. I am convinced the reason enterprise search got a bad name is because of that issue singularly - not technology quality, but sustainability. How do we get to the place where people look at this not as what they purchased and implemented, but what they purchased, kept for five years, and what that total cost was across those five years? If they do that, they will commit to it appropriately. If they don't, a year later it dies because they don't have the budget for the people.

Seth Earley: It brings up the point of value. Cost is irrelevant if we don't have the value. Why is value so difficult to establish here?

Massood Zarrabian: I don't know if it's difficult with visionary people, or with people who didn't go through the past seven to eight years of the market's evolution. I think the answer is education - getting people to understand what used to be has changed. And importantly: enterprise search implementations are no longer multimillion-dollar implementations. They're multi-hundred-thousand-dollar implementations. And that investment compared to the value it brings is almost nothing for most large organizations. You could start as little as $50,000 to $75,000 a year and have a very good solution for a department with some connectors. Experience it. If you don't like it, walk away. If you like it, you grow. Stop thinking million-dollar, five-million-dollar platforms with five people building something custom for you. That is the old approach.

Seth Earley: Thank you Massood. This has been incredibly informative and we appreciate your time.

Massood Zarrabian: Thank you very much for your time, and you two have a great weekend.

Chris Featherstone: Thanks for all the good information. It's a pleasure to have you on and to see all the great stuff you're doing as a company and personally. Thanks everyone.

 

 

Meet the Author
Earley Information Science Team

We're passionate about managing data, content, and organizational knowledge. For 25 years, we've supported business outcomes by making information findable, usable, and valuable.