Why the Gap Between an AI Translation Demo and Enterprise Production Is Wider Than Most Organizations Realize
Guest: Olga Beregovaya, VP of AI at Smartling
Host: Seth Earley, CEO at Earley Information Science
Published on: June 17, 2026
In this episode, Seth Earley speaks with Olga Beregovaya, VP of AI at Smartling, who brings 25 years of experience across every major evolution in natural language processing - from rules-based systems through statistical models, neural translation, and now LLMs. They explore why plugging into a commercial model at token-level pricing is not a translation strategy, how brand voice fractures at 300,000 employees, why information architecture is just as essential for language pipelines as it is for retrieval, and what it actually takes to deliver consistent, on-brand, multilingual content at enterprise scale. Olga shares candid and specific insights on language complexity, the human-in-the-loop imperative, and why the organizations that are finally succeeding with AI have stopped treating it as art for art's sake.
Key Takeaways:
- The price of a commercial model's tokens is not the cost of enterprise AI translation - data integrity, pipeline architecture, linguistic assets, and human review are the real cost drivers.
- Brand voice fractures the moment every employee can generate content autonomously - a Fortune 10 company discovered it had 300,000 voices overnight after deploying a co-pilot tool.
- Information architecture is equally essential for language pipelines as for retrieval - nested HTML tags, tokenization failures, and unstructured content break translation before the model ever sees the text.
- LLMs unlocked context that neural machine translation never had - resolving pronouns, disambiguating terminology, and working at document level instead of sentence by sentence.
- The assumption that AI translation works equally across all languages is one of the most dangerous misconceptions in the space - morphological complexity, writing systems, and training data representation vary enormously.
- Human review is not optional even in fully automated pipelines - it is how models learn, how ground truth is established, and how brand consistency is maintained over time.
- The organizations now succeeding with AI translation have moved from implement-and-fail to measured deployment - defining use cases, respecting prerequisites, and matching tooling to actual requirements.
Insightful Quotes:
"Yes, you can totally consume your million tokens at a super low price point, but what exactly are you buying for this money? Everybody can totally produce a translation or generate copy, but is it going to represent your brand? That's a different question." - Olga Beregovaya
"He installed a co-pilot tool and said, it's great, except my company has 300,000 employees and now my company has 300,000 voices. That's not necessarily what I was prepared for in different countries." - Olga Beregovaya
"If you want your models to evolve, and if you want your models to learn, you obviously need somewhere for these models to learn from - and this is where human review comes in. It is always twofold: guaranteeing the quality to your customers, and helping your models evolve." - Olga Beregovaya
Tune in to discover why AI translation at enterprise scale requires far more than a model and an API key - and what the organizations getting it right have built that their competitors have not.
Links
LinkedIn: https://www.linkedin.com/in/olga-beregovaya-04b5/
Website: https://www.smartling.com
Ways to Tune In:
Earley AI Podcast: https://www.earley.com/earley-ai-podcast-home Apple Podcast: https://podcasts.apple.com/podcast/id1586654770 Spotify: https://open.spotify.com/show/5nkcZvVYjHHj6wtBABqLbEiHeart Radio: https://www.iheart.com/podcast/269-earley-ai-podcast-87108370/ Stitcher: https://www.stitcher.com/show/earley-ai-podcast Amazon Music: https://music.amazon.com/podcasts/18524b67-09cf-433f-82db-07b6213ad3ba/earley-ai-podcast Buzzsprout: https://earleyai.buzzsprout.com/
Podcast Transcript: AI Translation, Global Content Strategy, and Why Language Is Harder Than It Looks
Transcript introduction
This transcript captures a conversation between Seth Earley and Olga Beregovaya about why AI translation at enterprise scale is one of the most technically and linguistically demanding applications in the field - and why most organizations dramatically underestimate what it requires. They cover the misconceptions that lead companies to over-trust token pricing, the information architecture prerequisites that determine whether content can even reach a model cleanly, what LLMs actually unlocked that neural machine translation could not do, how language complexity varies across 7,000 languages, when human review is non-negotiable, and how ground truth works when even human reviewers are inconsistent.
Transcript
Seth Earley: Welcome to the Earley AI Podcast. I'm your host, Seth Earley, and in each episode, we explore how artificial intelligence is changing the way we look at data, business problems, business strategy, and business operations. Today, we are going to talk about AI translation and global content strategy - a space where the stakes are high, the complexity is underestimated, and the gap between a working demo and a production system at enterprise scale is wider than people realize. There are 7,000 plus languages in the world, and the assumptions that work for Romance languages fall apart quickly everywhere else.
Joining me today is Olga Beregovaya, VP of AI at Smartling, an AI-first translation and global content platform. Olga brings 25 years of experience in natural language processing and linguistic machine learning, spanning every major evolution in the field - from rules-based systems through to statistical models, neural translation, and now LLMs. She served as CEO of a machine translation company, led language technology at Autodesk for over a decade, and now leads AI at Smartling. Olga, welcome to the show.
Olga Beregovaya: Thanks so much for having me.
Seth Earley: When we talk to executives and technology leaders about AI translation, what do they most consistently get wrong?
Olga Beregovaya: The first thing that gets gotten wrong most consistently is looking at the price sheet for commercial provider tokens and just saying, why can't I just pay seven dollars per million tokens and call it a day - plug it into my ecosystem? A lot of people get mesmerized by the price and do not realize the complexities behind it. That is actually one of the most frequent conversations we have with C-suite executives: yes, you can totally consume your million tokens at a super low price point, but what exactly are you buying for this money? The simplicity and accessibility, the feeling that everything is available right now and you could do it yourself within a day - that is probably the biggest misconception.
Seth Earley: When is the simple approach actually okay? When does it work?
Olga Beregovaya: My message is: separate branded content from non-branded content. If it is content that represents your brand, if it is outbound content and you are sensitive to your brand tone and voice in the market, then a generic Copilot-style approach is not a good idea. If it can carry any brand damage, or liabilities - think CROs, any kind of documents in regulated industries like life sciences or pharma - technically you can do it, but why would you if it can potentially bring about a lawsuit?
The converse: for pedestrian content - and I might need to steal that term from you - it is perfectly fine. In our industry we call it user-generated content. Internal comms, internal emails, user reviews in less specialized industries, opinion portals, retail. Go for it. Plug it in and you are golden.
Seth Earley: Is there anything else executives consistently misunderstand?
Olga Beregovaya: The second one is brand voice at scale. I was talking to a senior executive from a Fortune 10 company. They installed a co-pilot tool, and he said: it is great, except my company has 300,000 employees, and now my company has 300,000 voices. That is not necessarily what I was prepared for in different countries. Everybody can produce a translation or generate copy, but is it going to consistently represent the brand? That is a different question entirely.
The third misconception is data integrity. People totally underestimate its importance when it comes to in-platform data they are going to feed into RAG or an agentic workflow. Data integrity, both engineering-wise and linguistics-wise, and data interoperability are critical. Large language models do not play well with special characters. They do not like emojis. Inline tagging - good old WYSIWYG HTML - can completely stall your entire translation process. Content in, content out is the assumption. But the number of events that happen in a pipeline to process a single chunk of text - people just do not realize how much infrastructure sits between the raw content and a clean model output.
Seth Earley: What changed when LLMs arrived, compared to what neural machine translation could do before?
Olga Beregovaya: The 2017 transformer paper - attention is all you need - was probably the first breakthrough. That is where neural machine translation, deep learning-based machine translation took over. But the constraint was that neural machine translation generally operated within sentence boundaries. Every API call would send one chunk of text - literally start of sentence, end of sentence.
The two major breakthroughs with LLMs were the scale of parameters and training data, which unlocked access to different domains and different languages, and the extended context window. Suddenly, resolving pronouns, resolving homonyms, resolving ambiguity in terminology - all of it became manageable. As long as you can operate at chunk level, or even document level, the quality difference is significant.
Seth Earley: Where does AI translation break down across different languages?
Olga Beregovaya: There are several breaking points. The first is training data representation - most foundational models are predominantly English-centric, and at least half of those 7,000 languages are underserved in the universe of NLP and AI translation.
The second is tokenization. If you cannot tokenize your text properly for a given language, that breaks the process before you even reach the model. Right-to-left languages, ideographic languages - they each present their own tokenization challenges.
The third is morphological complexity. Finno-Ugric languages like Hungarian have around 16 grammatical cases. German compounds can stack stem on stem on stem until you have 40 characters in a single word that the model then has to parse out and understand. Japanese has four distinct writing systems that you need to pivot between.
Romance languages - Latin American Spanish in particular - are where models perform best. Large population, well-represented training data, relatively manageable morphology. That is why demos almost always use Romance languages. The further you move from that, the harder it gets.
Seth Earley: What does good information architecture look like as a prerequisite for translation pipelines?
Olga Beregovaya: It is an equal parts engineering issue and linguistics issue. If you cannot parse the input properly and reduce it to a format the model can handle - say, JSON or XLIFF, which is the industry standard XML format - that is the first breaking point.
Take subtitles as an example. Subtitling looks like a simple use case. But an SRT format is not just a sentence - it is chunks within a sentence, reflecting human intonation and pauses. You are not dealing with a segment, you are dealing with completely random chunks. You need to stitch them back together for the model to understand context, then re-chunk them with correct timestamps for the translated output.
And then there are string length constraints. You can put every constraint you want in your prompt - be formal, be concise - and the model still needs 40 more characters to express the same idea in German than it did in English. UI translation breaks when your target language consistently runs longer than your source.
Data governance, structure, and clear taxonomy are prerequisites. Retrieval is only as good as your linguistic assets are. If you are feeding it garbage, you are not getting anywhere. We work with our customers on data governance, data curation, data structure, and data organization before anything else, because it is the foundation everything else runs on.
Seth Earley: Tell me about the hybrid approach - how do you decide when to use frontier models, smaller language models, fine-tuning, RAG, and rules-based layers?
Olga Beregovaya: Fine-tuning requires data, it is expensive, and it can be time-consuming. For retrieval, you need curated, structured linguistic assets - translation memories, term bases, style guides - that can be retrieved to ground the output in your brand's specific voice and terminology. That is where we land most of the time: RAG-based processes augmented with customer-specific linguistic assets, producing higher quality than what you can get from a frontier model alone.
Then there is the question of when a small language model earns its place. Sometimes you can get the basics from a frontier model and augment with a second layer - a smaller, task-specific model that handles the domain-specific or brand-specific nuances. Each layer has to earn its place by improving quality or reducing cost in a measurable way.
The older-school tools still have a role. We still use Elasticsearch in different applications, keyword extraction, and rules-based filtering for specific use cases. The tribal dances you have to go through to get a large language model to specialize and stay within bounds - sometimes a rule is simpler and more reliable than a prompt.
Seth Earley: Where does human review remain essential in an AI-first workflow?
Olga Beregovaya: The biggest line item for major AI developers right now, after data centers and GPUs, is human-curated data sets. They have already scraped everything they could from the internet. The data does not evolve as fast as the models' hunger for high-quality curated data. Human review is how models learn, how ground truth is established, and how quality is maintained over time.
In our world: if you want to ship on-brand, high-quality content, you absolutely need a human review and human correction capability. We use a content sensitivity and risk tolerance matrix divided by markets - AI-positive markets where acceptance is high, and AI-negative markets where tolerance is lower. Based on that and the content type, you phase in human review accordingly. It can be light spot-checking for high-volume, lower-risk content, or full human review for regulated or brand-critical material.
There is also an interesting paradox in quality judgment. Getting agreement between a human reviewer and a model is actually half the problem. The real problem is the subjectivity of human reviewers themselves. A human annotator can be in a good mood on Monday and a bad mood on Wednesday, and judge the same output completely differently. And models are not immune either - we have a feedback loop technique where the model flags a mistake, you correct it, and the model then says no, actually the original was correct. Getting consistent, objective ground truth from human review requires its own methodology, not just a process.
Seth Earley: What advice would you leave executives with as they think about this space?
Olga Beregovaya: Two or three years ago, everybody was in love with LLMs and what they could do, and 85 percent of implementations failed. Now we are finally in a place of measured deployments. Measure seven times, cut once - understand that you have a real use case before you invest. Do not underestimate the complexities. Your most brilliant engineers may not speak all the languages, and they may not know all the intricacies of the pipeline. Know your ROI, define your use case, and do not pour time and money into something that domain experts can do for you faster and better. If it is not broken, do not touch it. If it is broken, build the right foundation before you try to fix it with AI.
Seth Earley: Olga, thank you so much for joining us and sharing 25 years of expertise in this field. It was a fascinating discussion and you opened my eyes to a lot of details I had not fully appreciated.
Olga Beregovaya: Thanks so much for having me.
Seth Earley: And to our listeners, thank you for tuning in to the Earley AI Podcast. Be sure to subscribe for more conversations about how AI is shaping the future of business. We will see you next time.
