Earley AI Podcast - Episode 86: Open Source, Observability, and AI-Driven Engineering with Tom Wilkie

Written by Earley Information Science Team | Apr 17, 2026 2:55:14 PM

How Grafana Labs Built a Competitive Edge Through Openness, Agentic AI, and Engineering Culture

Guest: Tom Wilkie, VP of Product at Grafana Labs

Host: Seth Earley, CEO at Earley Information Science

Published on: April 17, 2026

In this episode, Seth Earley speaks with Tom Wilkie, VP of Product at Grafana Labs, a leading observability platform serving 25 million users across 50 global regions. They explore how Grafana's open source "big tent" philosophy creates unexpected competitive advantages in the AI era, why agentic AI is transforming how engineers respond to production incidents, and how the build-versus-buy debate is shifting with AI-assisted development. Tom shares candid insights on engineering culture, remote-first work, and why junior engineers may be more valuable than ever.

Key Takeaways:

Grafana Labs' open source strategy gave AI foundation models deep familiarity with their software, creating a powerful and unexpected competitive advantage.
Agentic AI is transforming observability by automating root cause analysis of production incidents, reducing engineering response time significantly.
Adaptive telemetry technology automatically identifies unused data, enabling organizations to cut observability costs dramatically without sacrificing coverage.
The build-versus-buy debate is shifting, but the real hidden cost is long-term maintenance - not the initial development effort.
Emergent engineering standards outperform top-down mandates; leaders consistently overestimate how much centralized consolidation is actually needed.
Remote-first engineering works when companies deliberately engineer collaboration rather than relying on spontaneous hallway interactions that rarely happen anyway.
AI-powered LLMs may solve the remote junior engineer onboarding problem by providing a low-ego, always-available resource for learning and guidance.

Insightful Quotes:

"By having 25 million users worldwide, they're out there blogging, publishing examples, tweeting, publishing videos - generating so much content on the open web about how to use Grafana. These foundation models are trained on that data. They know how to use our software better than proprietary competition." - Tom Wilkie

"The cost of consolidation is often underestimated. And it's often dangerous to the culture, because as soon as you start telling engineers that have poured their heart and soul into this project to drop it - that's devastating to people." - Tom Wilkie

"Openness - whether it's open source, open standards, open culture - is not just a philosophy. It really is a competitive strategy. It lowers switching costs, builds trust, and in the area of AI, it turns out to be the best way to make sure your models know how to use your technology." - Seth Earley

Tune in to discover how Grafana Labs turned open source philosophy into a winning AI-era strategy - and what engineering leaders can learn about culture, observability, and building for the long term.

Links

LinkedIn: https://www.linkedin.com/in/tomwilkie/

Website: https://grafana.com

Ways to Tune In:

Earley AI Podcast: https://www.earley.com/earley-ai-podcast-home Apple Podcast: https://podcasts.apple.com/podcast/id1586654770 Spotify: https://open.spotify.com/show/5nkcZvVYjHHj6wtBABqLbEiHeart Radio: https://www.iheart.com/podcast/269-earley-ai-podcast-87108370/ Stitcher: https://www.stitcher.com/show/earley-ai-podcast Amazon Music: https://music.amazon.com/podcasts/18524b67-09cf-433f-82db-07b6213ad3ba/earley-ai-podcast Buzzsprout: https://earleyai.buzzsprout.com/

Podcast Transcript: Open Source, Observability, and AI-Driven Engineering

Transcript introduction

This transcript captures a conversation between Seth Earley and Tom Wilkie about how Grafana Labs built a global observability platform on open source principles, and why that openness is now a decisive competitive advantage in the AI era. They discuss agentic AI in production operations, the evolving build-versus-buy debate, engineering culture at scale, and the surprising case for junior software engineers in an AI-assisted world.

Transcript

Seth Earley: Well, good morning, good afternoon, good evening, depending upon your time zone. Welcome to the Early AI Podcast. I'm Seth Early, and today I'm joined by Tom Wilkie, who's VP of Product at Grafana Labs. For those who don't know Grafana, think of it this way. Every modern company is a software company, and every software company needs to understand what its software is doing. That's observability. And Grafana Labs has become one of the most important players in that space, with 25 million users. And their tools are used worldwide with over $400 million in revenue. But what makes this conversation really interesting for our audience is not just the technology, it's the philosophy. Grafana Labs has built its entire strategy around openness. Open source, open standards, open data, and the philosophy has some fascinating and unexpected consequences in the AI era. Tom, welcome to the show.

Tom Wilkie: Hey, thank you for having me.

Seth Earley: So, our audience includes technical practitioners and business leaders. For somebody who's not deep in the infrastructure world, set up a problem for us. What does observability actually mean, and why should business leaders care about it?

Tom Wilkie: Yeah, of course. So, if you're a modern business leader, you're probably familiar with the term, like, software is eating the world, right? And this is the idea that every business is now effectively become a tech business. So if your key competitiveness hinges on your ability to build and leverage technology, then you're gonna probably want to hire a bunch of software engineers. And those software engineers are gonna build some software that's gonna be running your business, and they're gonna need to understand how that software behaves, and they're gonna need to be able to iterate on and improve that software rapidly. That's observability. That's that set of technology and tools that helps software engineers respond to incidents, improve performance, and ship features quickly.

Seth Earley: So, observability has been around for a while, but of course, in the AI era, we have a lot more opacity of certain functions and internal mechanisms. These are statistical outputs, not deterministic outputs in many cases, right? So tell me how this new world of AI, machine learning, large language models, and generative AI impacts observability. What are the biggest issues and challenges?

Tom Wilkie: Yeah. So I look at it really from three angles. One is the whole problem of observability becomes potentially a little bit easier when you apply agentic techniques to it. Using agents to automatically root cause analyze incidents in your production infrastructure is now, and has been for a while, possible and really quite effective. If you've got consolidated observability into something that has a nice interface that agents can leverage, then suddenly it's very easy to tell what happened. It's very easy to automate a lot of the processes around release management.

The second big angle is, obviously, AI and LLMs have completely revolutionized how we build software. With technology like Claude Code being so capable of autonomously building software to a specification, it's quite a natural extension to give these desktop agents access to your observability system. The role of observability tools, really helping an engineer understand a problem, is shifting towards helping agents understand those problems. And it has this nice effect of combining and tightening the dev test loop, allowing engineers to iterate more quickly.

The third example is one of the patterns you see in the software industry - every time there's an architectural revolution, every time the way we build software changes, be it going from the mainframe to the cloud, from monoliths to microservices, from event-driven architecture - every time there's a new architectural style, there's a new wave of tooling to understand its behavior. And we're seeing exactly the same thing with agentic architectures, requiring a new set of observability tools to really understand their behavior.

Seth Earley: Absolutely, yeah. And we had just done a project for a client where we were helping with their software development lifecycle, and automated testing was a huge piece of this, but just affecting the culture change to get engineers to think differently about this, and to get leadership to realize that you don't need a massive team of offshore developers anymore. You can do these things with fewer people, with people that have more capabilities and experience, and let all that grunt programming be done by the tools. It's astounding. It's revolutionary. It's amazing.

But as you say, this new wave does require a new way of thinking. You mentioned that there are $30 billion plus players in the observability space. It's interesting that unlike something like CRMs, where a small number of vendors tend to dominate, observability hasn't consolidated. Why do you think that is?

Tom Wilkie: I love this about the observability market. Fundamentally, I think the reason this market can support so many competitors is because the switching costs are relatively low. For an observability system, you get a lot of value from the most recent data that you've put into it - you really only need the last few hours of data before you can start to notice trends and patterns. That fundamentally very low switching cost is one of the many reasons why this market really rewards the most innovative players. And then once you start layering on open standardization - a trend we're really encouraging at Grafana Labs - it's increasingly lowering those switching costs even further.

Seth Earley: So, you call your approach the "big tent" philosophy. Walk us through what that means in practice, and why it matters for enterprise buyers who are evaluating observability platforms.

Tom Wilkie: This was a really interesting aspect of the market. You've got so many players, all working on a very similar model of, send us your data, and we will keep that data for you, and we will give you tools that help you understand that telemetry. Grafana Labs, from day zero, started with a very different philosophy. We said, keep your data wherever it is, we don't care. What we'll do is build tools that connect to the data where it lives - in our competitors, in other observability stacks - and allow you to visualize it all in one place in Grafana. To this day, 12 years later, Grafana's still pretty much the only thing that does that.

But what open source also got us, in the age of AI, is models that started to know how to use our software. This was an aha moment for us. By having 25 million users worldwide, they're out there blogging, publishing examples, tweeting, publishing videos, just generating so much content on the open web about how to use Grafana, how to use Prometheus, how to use Loki. These foundation models from Anthropic, OpenAI, and Google are trained on that data. They know how to use our software better, actually, than they know how to use proprietary competition.

Seth Earley: That's so fascinating. Being open has created that competitive advantage when you'd think openness would not create a competitive advantage. It's very counterintuitive. You also mentioned three things that differentiate you - adaptive telemetry, big tent interoperability, and AI. Let's unpack adaptive telemetry, because the economics angle is fascinating.

Tom Wilkie: With interest rates going up, suddenly everyone tightened their belt, looking to consolidate and not grow their spend linearly with revenue. One of the big impacts was in the observability market - people asked, why am I paying so much to understand the behavior of my software? The response from most vendors was discounts, bundling. We wanted to do something differently. We built technology that identifies the data you actually use on a day-to-day, week-to-week basis. For the data you don't use - and it's commonly held that you're sending a ton of data to an observability vendor that's going unused - we give you ways of saving money on that. We can aggregate away high cardinality dimensionality, we can sample it, all automatically. Almost as if by magic, you turn it on and you halve your bills with Grafana Cloud.

Another example: we launched a product called Bring Your Own Cloud, BYOC, where for our very largest customers, we'll run a full region of Grafana Cloud in their Amazon, Google, or Azure account. They pick up the hardware cost, and pay us a relatively flat licensing fee. For massive AI hyperscalers, this can be an order of magnitude more cost-effective at scale.

Seth Earley: Right, and it's counterintuitive, because you're building technology that reduces your revenue per customer. But the alignment with customer value is the kind of thing that builds longer-term trust and retention. How did you navigate that internally?

Tom Wilkie: I can remember when we proposed this idea, there were legitimately seven-figure deals that the sales team were working on in cycle that got halved when we launched this. Understandably, some of the sales team were quite upset. But in hindsight, they now realize that the majority of customers who choose Grafana Cloud choose it because of this reason. There might be some short-term pain, but the long-term result is they're not losing deals because of pricing.

Seth Earley: When you're out in the marketplace, what beliefs are you encountering that you have to correct? What are the misconceptions about this space?

Tom Wilkie: The interesting one for me is, we sell observability tools to DevOps practitioners - it's an inherently conservative space. These are people held accountable for the reliability of systems, and they take that very seriously. There was apprehension of AI. When we first started launching agentic capabilities in Grafana Cloud, the market was a bit scared, and we had to build that level of trust carefully. The nice thing about observability is most of it is read-only, so the ability to actually break production from an observability system is relatively low. That gave us lots of opportunities to show people the value-add - making it easier to root cause analyze production incidents, generate visualizations, and so on. We're now at the inflection point where practitioners are starting to ask to give agents production access.

Seth Earley: Let's talk about build versus buy. Engineers want to build the thing you're selling them, right? And many times they underestimate the level of effort or the complexity. What's the misconception there?

Tom Wilkie: This is the SaaSpocalypse - the idea that because it's significantly easier to build software now, all SaaS companies won't exist in the not-so-distant future. I'm not sure it's ever been particularly difficult to build software. There's always been some young, naive engineer who says, I can re-implement Salesforce in a weekend. And honestly, there's some truth to that. But the value of Salesforce isn't that it's particularly good software - it's the ecosystem, the integrations, the availability of people who know how to use it. And it's the accountability. Salesforce has a team on the hook for making sure it works.

We actually built a whole early business around a piece of open source software called Graphite. Very capable engineers would download it, build a service around it, and monitor their software. Very successful. Those engineers would lose interest and move on. The organization would be left holding the bag for unmaintained software. Grafana Labs would come in, offer to take over the maintenance, and that's how Grafana Cloud was born. That's the real risk of build versus buy.

Seth Earley: You said something interesting about how customers ask how you build product so quickly with a small team, and the answer traces back to engineering culture. Tell us about the bazaar model.

Tom Wilkie: This is not my idea at all. It comes from a very famous book called The Cathedral and the Bazaar, about how open source software won over commercial software in the infrastructure space. It promotes this idea that an empowered engineer can scratch their own itch, solve a problem they acutely feel, and often do that better than a team of engineers and product managers trying to solve a problem they don't actually experience themselves. That's how most of the Linux ecosystem was built.

That's how we try to run Grafana Labs internally. We have a group of incredibly empowered, autonomous engineers told to go and do what they think is the right thing to do. It works very well for us because we build tools for software engineers - with a team of software engineers. We are our own biggest user. And with the majority of our software being open source, we get a really tight feedback loop with our community that doesn't involve money. Our community uses our software, gives us feedback, and occasionally contributes changes. With 1,700-1,800 people, we still describe ourselves as many small teams - the Amazon two-pizza team principle - and once a small group can fully own a problem and keep all the context in their heads, they can be incredibly productive.

Seth Earley: That makes a lot of sense. And the analogy to complex adaptive systems is useful here. You're providing attractors and guardrails - standards that emerge from the work, and feedback loops that keep things from going off the rails. It's much more resilient than a top-down mandate. So, how does that translate to AI adoption across the organization? Organizations want to let a thousand flowers bloom, but also feel they need to impose top-down structure. You're making the case for emergent standards, not prescriptive ones?

Tom Wilkie: As long as you have a nice, open culture of sharing ideas and letting the best ideas float to the top naturally, and as long as you have a bit of patience, the bad ideas will generally get forgotten pretty quickly. At Grafana Labs, we use GitHub issues, we close issues that go stale very quickly, and we have the institutional ability to forget bad ideas. The best idea tends to win quite quickly, and everyone standardizes on the emergent standard. Sometimes two solutions reach critical mass, and you have to ask yourself, is that really a bad thing? Maybe you've got the engineering bandwidth to sustain two, and killing one does more damage than good.

Leaders tend to think more top-down consolidation is needed than is actually needed. The cost of consolidation is often underestimated. And it's often dangerous to the culture, because as soon as you start telling engineers who have poured their heart and soul into a project to drop it, that's devastating.

Seth Earley: You're a remote-first company with 1,600-plus employees. A lot of organizations struggle with that. What's the analogy to hallway collaboration in your remote world?

Tom Wilkie: My impression is people massively overestimate the happenstance in office environments. The water cooler dynamic doesn't exist to the extent people think. You have to engineer connection in a remote company - but I'd argue even in an in-office culture, you shouldn't rely on it happening spontaneously. And especially at any form of scale, you're not going to have 1,600 people in one office anyway. We have customers all over the world, we're always going to have salespeople all over the world - why not have engineers all over the world too?

We try to run engineering internally a lot like an open source project. You're encouraged to form strong asynchronous communication patterns, to write things down, share content asynchronously. It's a much more inclusive and durable way of working. Hybrid, though - I really can't see that working. You've got all the disadvantages of being remote, like having to force written communication, with all the disadvantages of being in-office, like cliques forming and people feeling like second-class citizens if they weren't in the room.

Seth Earley: Let's talk about the case for junior engineers. There's a narrative that AI will eliminate the need for junior software engineers, but you're taking a contrarian view. Make that case.

Tom Wilkie: As a remote-first company, we've always struggled to onboard junior people. When I first started professional software engineering, I was constantly bothering the person next to me. That tight feedback and low cost of interruption bootstrapped my career. You just can't replicate that in a remote organization - I've never found an effective way. As a consequence, Grafana Labs has very few junior software engineers, and the ones we do have came through our open source communities, proved themselves, and we hired them when they graduated.

But AI might actually solve this. Suddenly, as a remote-first junior engineer, you've got something you can interrupt all the time, which is an LLM. Stack Overflow was where you used to go to ask stupid questions, and the first answer would be "you're an idiot," and the second would be "you're asking the wrong question." With an LLM, you can ask a stupid question with very low ego, because no one's watching. Suddenly, it might be possible to onboard a junior engineer into an organization again with an agentic buddy. And the junior engineers coming out of university now are actually more capable with the AI tools, because they've been using them from an earlier age. I'm relatively optimistic this breed of tooling will help Grafana Labs onboard more junior engineers.

Seth Earley: So, if you could leave the listeners one or two things to think about when evaluating their observability strategy and figuring out how AI fits in operations, what do you want them to walk away with?

Tom Wilkie: The one thing I've learned about the AI ecosystem and market right now is how quickly it's changing. All of the assumptions I had - even when you and I first met, Seth - have been radically changed in the space of a few months. The one constant here is change. I did not realize how true that was. You have to constantly be learning, trying, experimenting, and staying in touch with the latest developments. I suspect half the stuff I've said today will be provably wrong in 6 months' time. And I'm kind of excited by that. You've got to be excited by that, or else it's incredibly scary.

Seth Earley: I think it's a little bit of both. For our listeners, the through line here is that openness - whether it's open source, open standards, open culture - is not just a philosophy, it really is a competitive strategy. It lowers switching costs, builds trust, and in the area of AI, it turns out to be the best way to make sure your models know how to use your technology. Your competitors who keep everything behind a firewall or paywall have less visibility, and your open approach means you're being referenced in generative search results where they are not. Tom, where can people find you?

Tom Wilkie: Just check out Grafana.com or any of our projects on GitHub. You'll see me at every one of our events and many open source conferences as well. But thank you very much for having me, Seth - I really enjoyed that.

Seth Earley: That's Tom Wilkie, VP of Product at Grafana Labs. Thank you for joining us, and thank you to everybody listening. We'll see you next time at the next Earley AI Podcast.

View full post