Beyond Training the Model: How RAG, Semantic Search, and Metadata Transform Enterprise AI from Hallucination to Accuracy
Guest: Moritz Müller, Head of Product Management for Search and Generative AI at Squirrel
Host: Seth Earley, CEO at Earley Information Science
Chris Featherstone, Sr. Director of AI/Data Product/Program Management at Salesforce
Published on: March 11, 2024
Seth Earley sits down with Moritz Müller, a distinguished figure with a rich background in consulting and a leader in artificial intelligence applications. Before carving out his niche at Squirrel AI, Moritz Müller honed his skills at a prestigious consulting firm in Switzerland and spearheaded an ambitious venture by setting up an office in Singapore.
As the head of product management at Squirrel, Moritz brings a wealth of experience from digital transformation programs and a deep understanding of AI technologies across various industries. His insights into the burgeoning world of retrieval augmented generation (RAG) and large language models (LLMs) are second to none, offering listeners an in-depth look at the future of information access and management.
Moritz brings his expertise full-circle by stressing the importance of metadata, vector similarity searches, and the need for ongoing maintenance of knowledge bases to ensure that emerging technologies truly enhance our search capabilities and knowledge utilization.
Key Takeaways
- Organizations mistakenly ask how to train models on their data when the real challenge lies in combining search capabilities with LLM generation rather than model training itself.
- Large language models achieve 90%+ accuracy when provided correct context through effective information retrieval, making search quality more critical than LLM tuning for enterprise success.
- Retrieval augmented generation overcomes token limits by conducting semantic search first to identify relevant document pages, then passing only those 10-20 pages to the LLM for answer generation.
- Enterprise personalization requires honoring access control lists to ensure users only receive answers based on documents they're entitled to see across departmental boundaries.
- Metadata search combined with semantic search dramatically improves accuracy, as demonstrated by experiments showing 53% accuracy without metadata enrichment jumping to 83% with proper metadata tagging.
- The RAG market exploding from $13 billion in 2023 to projected $183 billion in 2027 reflects technology's revolutionary impact on enterprise knowledge management.
- Organizations must maintain basic data hygiene with document versioning, approval workflows, and decay factors for outdated content to prevent LLMs from surfacing incorrect historical information.
Insightful Quotes:
"The challenge that we will face in the next years is much more on making the information retrieval work well, not improving and tuning the LLM." - Moritz Müller
"If you really give the LLM the right context in these 10 pages of text, they are really good in giving you an answer, so the accuracy is well above 90%." - Moritz Müller
"There's no AI without IE—you need to make sure that in a big organization you have at least a minimum level of data hygiene and knowledge management." - Moritz Müller (paraphrasing Seth Earley's principle)
Tune in to discover how retrieval augmented generation transforms enterprise knowledge access by combining sophisticated semantic search with large language model capabilities—and why data quality and metadata remain the critical success factors.
Links:
LinkedIn: https://www.linkedin.com/in/moritzbmueller/
Website: https://squirro.com/
Ways to Tune In:
Earley AI Podcast: https://www.earley.com/earley-ai-podcast-home
Apple Podcast: https://podcasts.apple.com/podcast/id1586654770
Spotify: https://open.spotify.com/show/5nkcZvVYjHHj6wtBABqLbE?si=73cd5d5fc89f4781
iHeart Radio: https://www.iheart.com/podcast/269-earley-ai-podcast-87108370/
Stitcher: https://www.stitcher.com/show/earley-ai-podcast
Amazon Music: https://music.amazon.com/podcasts/18524b67-09cf-433f-82db-07b6213ad3ba/earley-ai-podcast
Buzzsprout: https://earleyai.buzzsprout.com/
Thanks to our sponsors:
Podcast Transcript: Retrieval Augmented Generation, Enterprise Search, and the Future of Knowledge Management
Transcript introduction
This transcript captures a comprehensive conversation between Seth Earley and Moritz Müller exploring the technical foundations of retrieval augmented generation, why information retrieval quality matters more than LLM training for enterprise applications, how metadata and semantic search combine to achieve high accuracy, the critical importance of access control for enterprise personalization, and why basic data hygiene remains essential for successful AI implementation.
Transcript
Seth: Welcome to the early AI. Podcast I'm Seth Earley and I'm really excited to introduce our guest today to discuss how emerging technologies like retrieval augmented generation will significantly impact organizations by improving information access. We're also gonna talk about risks around the hype around th, the risk that that hype. Large language models and and what that poses for an organization. If you don't adequately manage and differentiate leveraging your proprietary data, your intellectual property. We're gonna be talking about the balancing of the use of large language models with oversight for utilizing those proprietary data sources and and knowledge sources. And then, how utilizing metadata for context is crucial to personalized experiences and improving information access. And we'll talk about the evolving role of AI and repetitive task automation that will reshape shape career paths, managing autonomous agents. So our guest today has been working on digital transformation programs for many years and heads product management for search and generative AI technologies.
He has been working with corporate clients and financial services manufacturing and the public sector in Apac. He has a Phd. In geophysics and has experience in oil and gas exploration and multiple years of experience with it. Project implementation. He brings a wealth of experience in enterprise, search and product leadership at Squirrel, his current company, his and his technical expertise drives the understanding of technology at a very fundamental level more more. It's Muller. Please welcome to the show.
Moritz Müller: Thanks a lot. It's a pleasure to be here, and I'm looking forward to speak with you about what we do with squirrel, and how we leverage. Especially lately, the retrieval augmented technology to deliver state of the art enterprise, and we don't want to be a commercial about squirrel. But we'll talk about squirrel. So we'll talk. And then yes, we do want to know about squirrel, because scroll seems to be a really amazing technology. So I'm very impressed by that. But I just wanna let people know where we're not doing a commercial here. It's really about the knowledge in the front. But first I'd like to start with what are the typical fallacies, the typical misunderstandings and
Seth: myths that are out there that you run into on a day to day basis with executives, with colleagues, with customers, and so on.
Moritz Müller: That's a very good question. So the standard question that I get when we talk about Jenny, I, combining Llms with this retrieve lockment generation, then, is, how do I train the model on my data? That's probably this the first question that we got when Chetty, Pt. Came up last year? It's a question that we still get. and I understand that people ask that question, but I do not think it really reflects the challenges of of using genei, and especially the retrieval augmented generation technology for your enterprise and organization. The challenge is much, much more on. How can you combine the search and your own data with the respective capabilities, Jenny? I capabilities of a large language, and that's one of them the other. One of them. I think people often and misunderstand the limitations of Llms. So, for instance, a lot of people always talk about is my data secure? They're very worried. Is my data going out to an Api, someone is doing something with my data. If you look at this. And specifically, if you look at the example of Chachi Pt.
Security wise, I really think this is 100% safe. There's absolutely no question about this we are so used to using. The online resources of Microsoft, like Office 3, 65, sharepoint online, the Chat Gp models that we use in the private cloud is exactly on the same kind of systems. So there is absolutely no risk. Bigger challenge is, do you actually have a large language model available. And not only Chi Gp, but potentially something else to to use that in your environment. That is the much bigger challenge.
Seth: right? Right? And so just for folks who may be less familiar with the terminology when you say, training the Lom to find the different ways of training, and then just talk about retrieval? Augmented generation for a moment like, What's how's that paradigm different than simply asking a question of an Lm, so first question is training. What does training mean? And what are the different ways of thinking about training? Because some people think that means you have to train the Llm. Itself right? Whereas other you think about tuning and tuning is based on the data. So it's a similar concept. But it's very confusing. So can you clarify that for for folks
Moritz Müller: yeah, that's that's that's a very good question. And I start often with that. So for me, the whole talk about training comes from our historic experience. Whenever we talked about machine learning models or Nlp models, natural language processing models. These models typically needed to be trained or they were trained for your specific data so for for doing data classification, or for doing the tasks that they were supposed to be doing with your data. And and for that you train the model. Specifically, also chat port, for instance, a machine learning model for chat port that was used you train to model to give good answers. With for that specific use case
is the emergence of large language models, and I don't want to dive into the details of how complicated it is to actually train a large language model, as I think, the Chachi Pt. You can break it down. The the training of Chachi Pt. For instance, in 5 different stages, but it's super complex. And there is so much data being used for the training. then the limited amount of training that you would do on your data would not fundamentally change the output of the ln. So that's why, when we talk about training in the past, very valid question, the moment we look at this pure dimensions of these large language models. the whole conversation actually shifted on. How can we tune it? As you said? How can we configure it so that it gives us the best results specifically for the use case that we are addressing. And the way to do this is 2 fault. One is, as you mentioned, we can look at. How can we make make the whole retrieval piece better? How can we, in retrieve the information from an underlying search system, for instance, to provide the right contextual information. That's one challenge and the other challenge is. And you probably heard about this as well as the whole prompting challenge. how can we make sure that we configure the prompting in such a way? And so we basically, when we talk about prompting. it is all about giving the right instructions to the large language model to tell the model, what is it that we want to achieve?
Seth: Yes. that's very interesting. And you know our next webinar actually is on the alignment to prompt engineering and knowledge engineering. So we're going to be covering that coming up soon. And I'd love to pick your brain more on that whether we do it today or we do it a little bit later. But you know I know it gets very complicated. But can you give a high, level understanding or overview of those 5 stages of training that you mentioned? Because I have done some research and reading about large language models and feedback mechanisms, and just how unbelievably complex and sophisticated it is. I mean it. It really is amazing. The the just, the self-referential, continual feedback. you know, adjusting weights, adjusting the model as it's processing is just absolutely phenomenal. But you wanna just give a high level understanding of of what that, what those elements are. It's okay. If we dive a little bit into the weeks, people who don't want to hear that could just fast forward.
Moritz Müller: Yes, they are. But I also need to be very honest with you. So I myself am not an expert of training llamo, so I think I can break it down. Give you a high, level understanding. But for all the details you really need to speak to an expert. It wouldn't be fair if I if I if I were speaking for that. So the way if you look at Chetty Pt, the way works and the different stages. And again, 5 is not necessarily the the final number. But you have different stages on how you train them all. You train an initial model where you throw in a lot of content. and you strain a so-called base model in the first version, and that can be maybe on a high level. Just the amount is much, much larger compared with the original training of machine learning model. And so you train the model on all the text that you can get hold of. You need a lot of Gpu power to do this. Someone told me that. When I think Facebook Facebook trained their their first Ll, their first proper Llm. They spend 60 million dollars or so just on buying the Gpu units to train at initial model. So it's a it's a it's a crazy investment that tells you already. It's it's utterly impossible for an organization to just do. But that's the first step in the second step you you and and and the first step is, I think they call it, pre-training, and then use to quote supervise fine tuning where you try to make sure that, general, the general understanding is there. So you, you use certain prompts to to teach the models certain outcomes, and that gets then refined, more in a terms of a rewarding model. So that you basically tell them all you ask them some questions. And it's a manual manual process. But you can also use some technologies to do this, and where you ask them questions. And this is feedback learning. Where you say this was right, or this was not right.
And the more of this training you can do the better. Of course it gets, and that is also the stage where you later on can do the fine tuning. And you basically have a reward model to make sure, as the last layer on top of your of your trained Llm. You do more enforcement learning for specific use cases or kind of questions. So that is the additional layer that you can set on top of. But over all this whole process process, process, process, process, process is super complex. Yes. and again, not me, not being an Lm. Expert. But what I heard from Chachi, Pt. Chetty, Pt. I think they were also the surprise themselves that that that they are chetty Pt. Model work to well, and it took them quite a while to understand which of these steps and how the the combination of the stack that's actually led to to being so successful and giving support results. Yeah, no, that's a great, that's a great overview. And one of the things people forget is that there is a lot of manual training. There's a lot of human in the loop feedback.
Seth: There are a lot of people making judgments about these armies and armies of people who are doing that. And then when they start red teaming to try to break the model or get it to do something bad, then that's even more human in the loop. So so that's so interesting about training. Now, when you think about fine tuning. And this is where you know, we talked about training, there's a lot of different ways to think about training. But I think I think more about fine-tuning and fine tuning is really how you're going to access your data with the lol, I mean, that comes into retrieval, augmented generation. Do you wanna just give a quick definition of of ragged and and and then what has to be done to the data to make it readily available for the Llm. Using. Reg.
Moritz Müller: That's a very good question. So if you look at elements as such. It is very impressive. If you go to Chach GPTI think we've all tried it ourselves, and we asked the chat, GPTA generic question or general question. II even used it myself. To for instance, right? Some small routines and python or get inspiration again. To. Yeah, because my python days th. They passed by a few years. But I'm still trying from time to time, and it's impressive what the model can do
the moment it becomes specific. However. it is very challenging to source the right information. and the moment we talk about Enterprise Rock in that sense. the task of the journey. I technology is to give an answer that is specific to your organization or specific to your use case and the general knowledge that the model is trained on likely does not contain that piece of information that you're looking for, that is required to give you the answer. So, in order to overcome this. when we talk about retrieve augmented generation, we pair a so called information retrieval technology list, the capabilities of journeyi, large language model to give us an answer on a limited amount of data. So what I mean with this is. you can with a large language model, provide a so-called prompt or context to it. You can basically ask the large language model saying. this is my question. I have a question, for instance. let's say in, we work with some organizations. Here they have certain procurement guidelines. and the question is, then up to which sum can I directly give out, for instance, a contract or a purchase order to someone and up to when do I need to do a tender or something?
But this information is specific to the organization. If you just ask the Llm. It will not exactly know what you're asking for. But what you can do in is you can provide within this so called token limit. So this is this prompting context you can provide up to, I think at the moment is 10 to 20 pages of text roughly, that's the way to look at this. So you could provide 10 to 20 pages of text of context. You say, this is my 10 pages with my procurement, 5 lines. and you can ask the model. Say, this was my question. Dci guidelines. please answer the question based on the guidelines that I provide, and that is what the Llm. Is is is able to do. It is very good in doing this. It will then basically come, abstract the knowledge in these 10 pages and give you that specific answer from now, if you just look at this as the capability of the Llm. With Jenny I. To give you notes. and this is very powerful now, however. for now and in the near future we have a limit of this token limit. So you need to make sure you can put the right information in these 10 to 20 pages of text that you pass in this context.
Now, if you want to use the capability on top of. for instance, your knowledge based, let's say you have a lot of documents on a sharepoint or in in any kind of other Google trial system that you have in your organization 10 to 20 pages is probably not enough right? You are very limited in terms of that. The way to overcome this is by doing first a search. So that's why we use the information retrieval step. We first conduct a search. miss your question and try to find the relevant documents, and not only your documents, they actually try to find the relevant pages inside of your document that contain the likely answer to your question.
and this we then put together, basically the 1020 pages of context, we say, we take the top 10 results of our search engine. You always take one page of text. And then we put that together at 10 to 20 pages of context, and we passed it to the large language mode. And then the Llm is basically able to give us an answer based on all development pages that we have found in all of your knowledge, base and wrapping this up. This is basically the combination of semantic search on one site where you can search through millions of documents finding the right pages, and then you pass them on to the Ln. For the generation piece to summarize you. A nice answer, based on these relevant pages that you have added.
Seth: that's great, that's great. And and when organizations when I've I've spoken recently with some organizations who've tried this right? And they've just pointed it to all of their policy documents, and I'm not sure what repositories were. But they they got an answer it was about lunch. what the reimbursement policy was for lunch, and what the limit was, and the OM. Returned back a hundred $80,000, and it was also for Singapore of all places, and then they tried it again, and that was no matter where they they you know how they tried to define the area. It still came back with a hundred 8. Now, clearly. something went wrong. And when you start thinking about the repository, the quality, the data, you know. Why, you would get an erroneous answer like that. Can you, troubleshoot that a little bit? Can you talk a little bit about what makes good retrieval right when you start thinking about semantic search? Search parametric search, metadata, search. retrieval in general, what makes good retrieval, and then what do you have to think about but the date on the content?
Moritz Müller: Yes, happy to answer that before I exactly so give you an answer to this question. Let me make a general statement. I personally believe that in order to make retrieval augmented generation successful and useful. The challenge that we will face in the next years, where? Where you will see. All big organizations in the world. They will use the technology, they will use it on top of their internal documents and knowledge base because it's simply so powerful and can save so much money. It's an absolute, no brain. Yeah. But the challenge to do this is much more on making the information retrieval work. Well. the notes. Yeah, improving and tuning the lemon. And that is something. People I'm trying to explain to people the Llm. If you really give it the right context in this 10 pages of text and 10 pages of text for me is just a symbolic thing for the the bigger model. This can be more right. But if you provide the right information to the Llm. They are really good in giving you an answer, so the accuracy is, there are well above 90%. From what I've seen.
Now, the challenges, as you mentioned in terms of reimbursement policy. If you just point the information retrieval to all of your documents. the challenge is, how do you find the most relevant document with the answer so ideally? You point it to a most recent document. What are we seeing in a lot of organizations is they have each year a new iteration of the document, and each document is each previous iteration is stored in that knowledge base. So you want to have a certain time waiting factor in your information routine. That's one of the things. And on top of this we also use a lot of other metadata attributes that we, for instance, say everything that is older than one year gets a massive decay in the search function. Everything that is not labeled, and you can add those labels as a knowledge based document, or that is pre approved by someone is not a relevant, almost relevant search. And now imagine you have you work in a company that has been around for 50 years? There will be tons of documents talking about reimbursements and maybe different versions. People save their own documents. So the challenge is really, can you limit initially the amount of data that you serve through to the knowledge base that is Pre-approved. And because once you do this, you will make sure that you get a much better context, that you pass to the other lamp to actually give you notes.
And that is our way of of making the tree block minded generation successful in any kind of organization
Seth: that's great. And and you know it's funny. It it comes still comes back to the old problem. This is where people are misunderstanding the whole class of technology. It comes back to the old problem of context and and information. hygiene and knowledge curation. Right? Because you can't give it. You can't point it to crap and get something back. That's going to be useful. And you can't just point it to everything, because there's no context. So context is hugely important. And this is where you know you're familiar with my phrase that II coined years ago. There's no AI without, ie. This is kind of the the exemplar of that. Why don't you explain that concept in this context because you really did just explain it. But it it's a good way to summarize it. But you wanna just provide your interpretation.
Moritz Müller: So basic interpretation of this is you need to make sure. In a big organization, you have at least a minimum level of data, hygiene and knowledge management. Let me put it, maybe like this. and it starts with every folder as as stupid as it sounds, but every folder name that has that is called policy already, or Hr. Policy is relevant metadata information for search engine that we use now, I'm not asking to go back to the old days where everything was in folders. It is much more the question how can we make sure this? This metadata is stored on the respective document. and you will find big organizations nowadays. They have knowledge management departments. Even. What I've seen is I've seen organizations here in Singapore. The monitory authority of Singapore. They have huge amount of people, 50 people in working just a knowledge management.
Now, I'm not saying you need this to be successful. but if you can make sure that the the knowledge base, the main knowledge base of data is maintained in a certain way that you remove all information. You archive it. or you tag documents that are pre approved, saying, This is our guide, this get boosted in the search. Then, with this basic data hygiene, as I would call it. You make sure that not only the search is better, but also, if you use, retrieve, augmented generation, your checkpot basically that you give to your employees or to your users. finds you the best answer that you can get.
Seth: Yeah. And and and so the the whole idea of information architecture is structuring that knowledge and that content in a way that allows you to retrieve it. And when you think about what is so incredible about Lms is the interpret the question in a very, in the query, in a very sophisticated way. And then they interpret and and and process the results in a very sophisticated way to make it more conversational. And in between is that retrieval part right? Is that using your knowledge, base your database, your content. And when when you just try to use a large language, model by itself to answer question, what does it tend to do? If it doesn't have the answer, it'll make it up right unless you are very specific in your instructions to say, only use the information from this data source. If you don't have that information. Say, I don't know.
And then turning the temperature down to 0. So it doesn't get too creative. One is very creative zeros, not creative. And so you know, we've seen this work very, very well, and it comes back to again thinking about content, you know, in the old days people would say, well make it like Google, make it like Google, just make it like Google. And I. And I would say. Have you spent as much time and money and resources on fine tuning, your content. That organizations do who want to get good rankings in Google. It would be like a Google right, because it is about that curation is about that architecture. It is about that terminology. So it comes back to some of the old things. It's this is not a magic book. But it is amazing tool. It's an amazing tool. I mean, just like when you think about a chat bot.
and you have to. you know, interpret intents from utterances and utterances. You know, I forgot my password. My username doesn't work. I'm locked out of my computer. My computer is mad at me. Right, I just, and it but it all comes down to the same intent. Change, password. Well, large language models do that on steroids right? They take very sophisticated questions, and then they either you can expand those prompts where they can reconcile those prompts. But then they ask a question of the data source and using the parameters. This is why prompt engineering, I think, is so important because prompt engineering has to contain elements of that architecture. Right? You have to train people how to ask the right questions and use the right concepts. And then you're retrieving. Either you either you do talk about this. You're retrieving from that data source, using semantic search, or using some kind of parametric or metadata search or other signals. And then you're taking that result. And you're making it more human understandable, more conversational right. That's what the beauty of this is. That's what the power of this is. It's the in between part that we still have to get right. But the beauty of this thing is, you don't have to.
you know, formulate every response. What I did like about the early days of chat bots is, we looked at used cases and the specific content you needed for those used cases. So when I have an example, and I just found it on my phone, where I tried to find out how to activate a credit card right? And I lost a little sticker on the front, and I am going with their chat bot! And it's giving me pages and pages and pages of information, no matter how many ways I just asked the question, it would give me policies about this. Only one was the phone number to to call. Now, somebody had created that used case. They could have made that content very specific. So so this is where again, we have to focus on those use cases, we have to focus on personas.
We have to focus on the additional signals, right? That tell us what an individual wants. You know, it's funny on I listen to hard fork. Have you ever listened to that podcast hard fork. And they're talking about the fact that Bard came out and was being very woken very politically correct. And it wasn't giving pictures of white people. And all this stuff right? It tried to over correct, to not be biased about, you know everything being a white person, right white mail or whatever, and and essentially you know what what they said was.
what persona do you want barred to answer? The question is right, is it? Was it barred or is. No, it's Gemini. I'm sorry. Is Gemini Gemini? What what persona? What perspective do you want it to be right like? What is the users perspective. What is the users orientation, or mindset, or cape, or or preference? And that comes from additional signaling additional metadata about the user? Do you want to talk a little bit about that whole idea. And the idea of really doing personalization based on the user. Like, we all walk around with a metadata cloud around us right? And that metadata cloud tells us everything you know tells can tell the world what we want and who we are, and we change our roles. And so, you know, we talked about high fidelity journey models that kind of represent that user intense and metadata terms. Do you wanna talk about that, and how that really leads to to a personalization.
Moritz Müller: How does that fit in with with all of this? Absolutely, absolutely. So. The first part of personalization. And I think that is important to mention in the enterprise and organization context is you haven't any organizations you have so called acls access controllers. That means that you're probably all familiar with this. You have on a on a sharepoint, Google drive, or whatever you have only access to certain folders. There's certain information that say the Hr. Department. They will have information about salaries or contracts. That is not for everyone's eyes. Now we need to make sure, when we use retrieval generation that we only provide an answer based on the documents that you are allowed to see. So how do we do this? The way to do this is really we honor all the so called entitlements that are exists. That means we know who you are. When you look in we know you are working this department, you have access to those files. and then, whenever we do the search for information. we apply these so-called Acls and make sure you only get to see documents that you are entitled to see. That-that is for me one crucial part of personalization in the enterprise context. Now. adding to this. Of course, you also want to have an answer that is more personalized. Now, a second layer of personalization can be introduced when we introduce so called knowledge graphs or knowledge systems. So you can. And big organizations have
goes along with the whole knowledge management. As I mentioned the taxonomy handling. you understand? Who is this person who? What is his role? We have had an example. Where, for instance, in in a bank, an Esp investor. a CIO and a normal banker. They will have different kind of results that they get back, depending on the profile that they ask that they have someone searches for for certain information. And then we use the profile information that we have. So we understand who's this person which department is in? And we have a similar kind of tagging because of the knowledge graph. We have a similar kind of tagging of the data on as metadata on the data.
Seth: And then we boost
Moritz Müller: the results differently depending on individual you're in. So that's one of the key things that we and that's one of the key aspects of of personalization and other pieces, and I believe in the future. There is a lot of development in this that you start Llms to ask return questions, because I mean very much. If you ask an expert about something, you ask him a question, he will probably ask you a question back to focus the content much more exactly in, in a, in a way, in a conversation, to focus is much more on specific domain that you are interested in. Let's say you wanna buy house. You ask some10, I want to buy a house. Where should I buy first? We'll ask you. Oh, what are your preferences? You look for big or small house.
There you go. That is the whole personalization aspect. And I believe we get that in by applying additional filters on the data at the end of the day. These are just filters. It's a matter of questions of how we apply those filters. If you, if you take it a look at this, and you abstract the whole process. If you apply a filter during your search. touch that your initial document set. So you combine a semantic search, which is a word vector a vector vector based search this metadata filter and combined. They give you a rating of the search results that hopefully or in the ideal work we know it works. It gives you a better search, ranking for the individual person. That is how we achieve a better degree of personalization.
Seth: Right? And and I call that their digital body language, right? What are all the signals throwing off as you interact with the different tools and systems. They're all storing metadata about campaign responses and interests and level of technical proficiency and equipment that they're all that they own and all of that, just like a salesperson or good service person would ask those questions. Right? I do think search is a conversation, especially the old days of metadata, based. Search right? When you, when you were faced, search, when you when you put in a query that's ambiguous. If you went to the Granger site you typed in tools. Imagine if you walked up to a a counter at a hardware store, you said tools, they'd say, Okay, what kind of tools do you want right? And that's what it does with the facets. It says, what kind of tools do you want here? Power, tools, name tools, etc. And we're doing the same thing when we're asking those additional clarifying questions. Right? And that's a really important piece, because I think chat based applications are going to increasingly rely
on that prompt engineering to be able to identify the the metadata and to the entities and the metadata that's going to. That's going to make that answer more relevant. Just as you're explaining. So this is great. It's it's exactly how I thought of this in the past. And and this shows you how knowledge crafts come into being or come into play and how you can use a a customer identity graph or employee identity graph that includes, you know, things like what they're provision for, what they have rights to, and so on.
So this is great. so you can also may maybe set was just one common is so what we are seeing at the moment in in our daily work.
Moritz Müller: We're combining the we're building in a second layer of logic. So when you ask a question, we are in a first Nlp model that tries to decide which which information retrieval processing line, basically, the command should go down. So if you ask the text question that talks about procurement policies, it retrieves a document. When you ask for a stock price, you point you back to a structured database that draws you across so that you do not get tax as an answer. But you actually get a craft as an answer and a visualization where you have the information. And so the future will definitely see that these retrieval, augmented generation technologies will become more and more complex and connect many different systems inside of an organization
Seth: that's great. So you know, I do want to give you a chance. I said, this isn't a commercial about squirrel. But I do want to give you a chance to talk about squirrel, because it is such a an amazing tool. And and weird partners with squirrel and and starting to leverage it and want to introduce it to our customers. So talk a little bit about. You know, kind of how you see squirrel in the context of other environments and other platforms and other tools. What are the real differentiators that you guys have. I think one of them is that you're able to use well, different language models. Is that correct? And then, tell me, some of the other ones. Give me some of the other. It's very, very flexible, powerful tool, and you have the ability to do all this configuration, especially of those intents and the signals and the different types of roles, and so on. So talk a little bit more about squirrel as search engine and is about and about the integration with Llms.
Moritz Müller: Yeah, I'm so happy to do this. So if you look at us as a company. We have been around for quite a while now over a decade. and we have really started off as a search company. So Squirrel originally was considered a search engine the platform. In the meantime we are in the garden of Metric Quadrant for so called insights engine. It's an also again, an an evolution of the terms for insights. Engine is the magic term. What is in the term insights engine for me is that you use at the end of the day a lot of analytic technologies. and you bring them together. So because you have so much data in an organization that you want to make sure your your employees. So the smart trains in your organization they find as fast as possible the relevant information for them. That is what this is all about.
Now for us. It's Kuro. As I said, we have been doing semantic search for quite a while now, so we really know how to do search. And if someone asks me all. Every one talks out there about using Chachi Pt. About using Lllms. What is special about scroll? I think the key answer that. And and it potentially needs a technical person to to understand that. But the key answer is that all of the solution out there it just needed. They are very much limited by this token limit at the moment, so they can prepare 1020 pages of text, and they allow you to chat with these 1020 pages of text the moment it comes to your knowledge base, where you have more than just one document is 20 pages of text, or even an annual report. We we have Esg questions on any reports where you have 100 pages of text. These tools will not help you, because they need first to do the so called information retrieve.
And I think that puts us in quite a unique spot, because we can now a days ship out of the box. a platform that does all the so called tokenization and embedding generation that you need for vector models for the semantic search. And we give you a native integration of any Lm. That you want to use? And why is that so important? Chachi Pt. Is there to stay? Chachi Pt. Has a lot of advantages. The big advantage of Chachi Pt. Is, it will be probably for quite a while be one of the cheapest alerts and solutions out there. That is something very, very clear. but there will be other llamas. and when you run a retrieve augmented rock framework, as we called it. you want to have a bit of flexibility with squirrel. You can switch out the Llm. With any other Llm. That's out there. So we are completely agnostic to the large language model that we use. We now days even run it on premise. So we use, for instance, for Mistral. I think the model is called mixture. One small model that we can run fully on premise. I mean, I'm not saying this comes for free. It comes with significant amount of costs associated with it. but we can do it fully on premise.
So for the work that we do for instances, our big banking clients, and and some of these clients that are heavily regulated. They really need us to look beyond Chachi Pt. Because the compliance is still struggling with approving that for some of the sensitive information. And I think. Really, because we have this, this decade of of knowledge, of how to do the information retrieval, how to do the semantic search. That is the reason why we are in such a prime spot on when you want to do this on top of your own knowledge base, and maybe just as a last step there. If I talk to clients here, everyone talks about Microsoft Co. Pilot Microsoft Copilot is great, right great great tool. But if you want nowadays to do between augmented generations for question answering on your sharepoint, for instance. Microsoft is not giving you an out of the box product at the moment they are not, they will tell you. Here are the 20 D Building blocks. You need to put us together, so go and try it yourself. That is the key advantage to this. You can just basically buy it off the shelf. You say I want to do question answering on all of my documents, enterprise, great security and tidal, and handling out of the box we we are able to provide.
Seth: That's a that's great. That's a huge advantage. Now, when you think about when you think about metadata, we did an experiment with a life Sciences company where we built a knowledge architecture. We componentized the content we ingested it into an Lm. And we ingested it, both without the knowledge, architecture, and with the knowledge architecture to compare, and we found that when we had those enriched embeddings in the Lm. We were able to get much, much better results from from 53% accuracy without the enrichment and the metadata to 83%. So when you think about scroll on how scores doing this you are, do you are doing a vector, based search. Yes, so, and then you're ingesting that. Content with the additional signals of whatever metadata is on those documents.
Moritz Müller: That's great. That's that's it. Yeah. But but maybe just to look at this. So some monte search alone will only find you the relevant paragraph inside of the document. So let's say you ask a question about how to change the battery in the car, and you have all the manuals for cars loaded. Then the deadend. This semantic search will find you from all the manuals. Likely the paragraphs to talk about how to change the battery in a car. Now that alone, if you have a fort, and you want to get the the result, for for Mercedes will not help you. So the way to improve this is combining, as you stated, the semantic search. This is called metadata search. That means when we store a document we take it. We say. this document is related to this specific font model. This document is related to this specific car model, like, you must see this Bmw. Or whatever we store that metadata on the respective items when we indexed them.
Now, when you ask the question, you basically ask how to change the battery in my Mercedes. then we use technologies that understand. Oh, you want this car model. So my first one. a filter like a positive search with that prospective metadata to filter down all the amount that you search of data that you searching, and in a secondary step you then run the semantic search to find the respective paragraphs in the fitting document, and that's how we combine the so called metadata search. It's a semantic search to make sure you get the most relevant output after doing.
Seth: But you could also say that the metadata with the document would be additional signals for the Lom. But you're doing it in a stepwise fashion. You're filtering first and then using the Lm. But could it conceivably be done the other way. Where you're saying I'm just gonna ingest this. And I'm gonna use that those those embeddings. I'm gonna enrich those embeddings with additional metadata and then use that with a vector similarity search. That's I know. That's a very technical question.
Moritz Müller: You can also do that item. You could, of course, also do this, it's just so if you look at vector, search, it becomes. So we we do this. We add to, for instance, typically, when we index the individual paragraphs, we decide that we always want to index. If you look at a document index, you always add the the document title. Yeah, as part of the embedding.
Seth: the same you can do with metadata.
Moritz Müller: and I'm just not convinced that it solves the problem, because if you, as I mentioned before in Rock. The challenge is the so-called token limit that you have to context. So if you, General, if you find now a lot of embeddings that talk about these kind of things. You still need to limit it to these 10 pages of text. If you get the right result with the semantic search approach, or by combining semantic search with the metadata approach
Seth: at the end of the day. It does not matter as long as you can make sure you point the right understood totally, totally get it. And we we're also thinking of the fact that you have to componentize content, you know, because you because when you bring it into an Lm, you kind of break it up into arbitrary, you know, frames, and you have overlap between those frames for context. But that doesn't always give you the context. You know the example I uses. If you have an error code for particular product. Well, if you just sent to a service tech, I would never. Code of 92. Right? Well, okay, what do you want? What? What modem, what? What's the configuration? etc., and not in that would have to be on that piece of content. Right? That contextual metadata. Right? So then, in that, in that case, yes, you could do it, filtering it first right? Doing a parametric search.
a retrieval, and then you could use the Lm, so I could see how both those ways would work. So we're we're almost out of time here. But I wanted to give a couple of questions to learn a little bit more about who you are. You are located in in Singapore. And do you wanna talk a little bit about kind of how you got to what you're doing? And you know, I know you. You worked in in oil and gas exploration, and I know you have a Phd, how did all that kind of impact? What you're doing now. 5 to 8 min, and I happy to speak about this. So I worked in oil and gas after my, my, my bachelor and and masters. And I really enjoyed this. It's exciting oil and gas. Is technology wise? One of the most advanced
Moritz Müller: fields that we have, you know, as as humanity as you. It's it's really amazing. And if you look at it nowadays, a large part of our economy is still driven by it. So really, really amazing. But then I decided, Look, I want to do a Phd. I went back to university. I appreciated this. I learned a lot about the techniques and then II think it was a phase where the oil price was down. I decided I didn't want to go back into a sickly to business. So I started, and at the time in Zurich I started to go into consulting work, and I ended up with an AI company. That's how it often goes. and that that was interesting and exciting. So I first worked for a consulting company. in in Switzerland, and then I switched I switched at some point over to Scroll Company again, very exciting for me, and when we had our first project in Singapore.
I came here to to lead this project, because it was a lighthouse customer for us a very big project. and then I committed at the very beginning of 2020 I committed myself to open up an office in Singapore, and I flew in one week before the Covid lockdown to Singapore. That was ahead of right then, for one and a half years. Wow!
And but I don't have any requests. I really like the place. It is fantastic to to see. I mean
Seth: to see Singapore, and also how how technology here is is evolving. I think there's a lot of potential in this area. I've never been to Singapore. I would love to visit and do you have a family out there?
Moritz Müller: No, II don't have a family here, but then pretty sure they would also enjoy it. So everyone that comes and visits me only complains that it is basically warm every single day. But I but but I really like it. I do miss the the Swiss winters. I have to admit that.
Seth: What's fun for you?
Moritz Müller: I love to do cycling here in Singapore Road. Cycling is. It's quite a big community. But you have to go out early to, to to escape and and beat the the sun in that sense, and also bit the traffic right? Right? So what time do you go out
we normally go very early at 6 in the morning on a on the weekend. So it's it's if you continue an early start to the to the weekend, and then yeah, you end with it. Typically with a coffee and a good local breakfast at 9 in the morning, and then you have had your workout when when many other people are just about to start the day. Yeah, that's wonderful. That's a great way to start your day. I'm not a big morning workout person, and I had to take some time off because of some
Seth: medical stuff that I had to deal with. But but I do know when I make myself work out in the morning. It's it's fantastic and it's a great way to start the day.
Well, listen. This has been really wonderful. you know, I'd like to ask one final question of folks, you know, when you, when you kind of think about your life, and you think about what you've done. If you went back to yourself and could talk to yourself when you got out of college. Is there some piece of advice you would have given yourself based on what you know now?
Moritz Müller: I guess so. One of the things that I would have done earlier is, I think I would have come to Asia before, so I first time I came to Aisha. I was already in my thirties. I definitely would have loved to come here early and also learn Chinese. That's one of the things I would have told myself, learn Chinese, because in this area, in this region, Chinese. it's driving a lot of things. Even so, the business language is English. And on the relationship side. It helps to open a lot of doors, not only in China, but in all the other regions around the world. And the other thing, I would tell everyone.
Really, also myself. Just just follow what you want to do. Just just try out technology. And and you see a twisted hype of churchy pitty. Right nowadays there's always gonna be something new with technology. So just the moment you are interested in, follow it and and think about how you can potentially have a business or career even out of it. That is something that I would recommend to everyone. Be, be, be be passionate about it, right? As as long as you're passionate about what you do. You really enjoy it, and and the moment II would lose my passion, I would really have to reconsider what I'm doing.
Seth: That's wonderful. And we'll put your contact information, your Linkedin information. is there? Do you have a simple linkedin handle that you right now, what is your link?
Moritz Müller: And you will find me? there's more. It's the miller. I think you should find me in in Linkedin. that should be in there. But if you also find, use more similar events for Linkedin, so if you want to reach out, just discuss the tribal augmented generation how it works in enterprise. Setup. Please feel free. Reach out. II really think this is a huge topic for the next 5 years it will. It will revolutionize the the whole the whole way. We we handle data.
Seth: And and you know all the consultants and the analysts. And everyone is saying that it's gonna have the biggest impact on any of any AI initiative in large organization, any organization, retrieval, augmented generation and the market is for for ragged is predicted to be was what 13 billion dollars in 2023 is predicted to be 183 billion dollars in 2027. So yes, we're at the beginning of that hockey stick. And it is an exciting, exciting time. Well, Morg, thank you so much for being here. It's really been a pleasure really appreciated your time.
Moritz Müller: Thank you, sir.
Seth: and thank you to our audience and we will see you next time on our next podcast for early the early AI podcast again, thanks, everyone and thank you more.
