Earley AI Podcast – Episode 70: AI Infrastructure and Computing Innovation with Sid Sheth | Page 12

Earley AI Podcast – Episode 70: AI Infrastructure and Computing Innovation with Sid Sheth

Empowering Creatives with Generative AI

 

Guest: Sid Sheth, CEO and Co-Founder of d-Matrix

Host: Seth Earley

Published on: July 11, 2025

 

This episode features a fascinating conversation with Sid Sheth, CEO and Co-Founder of d-Matrix. With a deep background in building advanced systems for high-performance workloads, Sid and his team are at the forefront of AI compute innovation—specifically focused on making AI inference more efficient, cost-effective, and scalable for enterprise use. Host Seth Earley dives into Sid’s journey, the architectural shifts in AI infrastructure, and what it means for organizations seeking to maximize their AI investments.

Key Takeaways:

  • The Evolution of AI Infrastructure: Sid breaks down how the traditional tech stack is being rebuilt to support the unique demands of AI, particularly shifting from general-purpose CPUs to specialized accelerators for inference.

  • Training vs. Inference: Using a human analogy, Sid explains the fundamental difference between model training (learning) and inference (applying knowledge), emphasizing why most enterprise value comes from efficient inference.

  • Purpose-built Accelerators: d-Matrix’s approach to creating inference-only accelerators means dramatically reducing overhead, latency, energy consumption, and cost compared to traditional GPU solutions.

  • Scalability & Efficiency: Learn how in-memory compute, chiplets, and innovative memory architectures enable d-Matrix to deliver up to 10x lower latency, and significant gains in energy and dollar efficiency for AI applications.

  • Market Trends: Sid reveals how, although today’s focus is largely on training compute, the next five to ten years will see inference dominate as organizations seek ROI from deployed AI.

  • Enterprise Strategy Advice: Sid urges tech leaders not to be conservative, but to embrace a heterogeneous and flexible infrastructure strategy to future-proof their AI investments.

  • Real-World Use Cases: Hear about d-Matrix’s work enabling low-latency agentic/reasoning models, which are critical for real-time and interactive AI workloads.

Insightful Quotes:

“Now is not the time to be conservative and get comfortable with choice. In the world of inference there isn’t going to be one size fits all... The world of the future is heterogeneous, where you’re going to have a compute fleet that is augmented with different types of compute to serve different needs.” - Sid Sheth

"Training is like going to school—you do it once. Inference is like using what you learned every single day for the rest of your life. That's where the real compute demand is." - Sid Sheth

"The future of AI infrastructure is purpose-built accelerators that are optimized for specific workloads, not general-purpose solutions trying to do everything." - Sid Sheth

Tune in to discover how to rethink your AI infrastructure strategy and stay ahead in the rapidly evolving world of enterprise AI!

Links

LinkedIn: https://www.linkedin.com/in/sheth/

Website: https://www.d-matrix.ai


Ways to Tune In:
Earley AI Podcast: https://www.earley.com/earley-ai-podcast-home
Apple Podcast: https://podcasts.apple.com/podcast/id1586654770
Spotify: https://open.spotify.com/show/5nkcZvVYjHHj6wtBABqLbE?si=73cd5d5fc89f4781
iHeart Radio: https://www.iheart.com/podcast/269-earley-ai-podcast-87108370/
Stitcher: https://www.stitcher.com/show/earley-ai-podcast
Amazon Music: https://music.amazon.com/podcasts/18524b67-09cf-433f-82db-07b6213ad3ba/earley-ai-podcast
Buzzsprout: https://earleyai.buzzsprout.com/ 

 

Podcast Transcript: AI Infrastructure, Training vs. Inference, and Purpose-Built Accelerators

Transcript introduction

This transcript captures a conversation between Seth Earley and Sid Sheth on the technical and business aspects of AI infrastructure. Topics include the fundamental differences between training and inference, why purpose-built accelerators are essential for enterprise AI, the evolution from CPUs to specialized AI chips, the economics of AI deployment, and strategic advice for organizations building their AI infrastructure.

Transcript

Seth Earley:
Welcome to the Earley AI Podcast. I'm your host, Seth Earley, and today I have a very special guest—Sid Sheth, CEO and Co-Founder of d-Matrix. Sid has an incredible background in building advanced systems for high-performance workloads, and his company is doing some really innovative work in AI compute infrastructure. Sid, welcome to the show!

Sid Sheth:
Thank you, Seth. Great to be here.

Seth Earley:
So Sid, let's start with the basics. For those who aren't deeply technical, can you explain what d-Matrix does and why it matters?

Sid Sheth:
Absolutely. So at d-Matrix, we're building specialized hardware accelerators specifically designed for AI inference. Now, let me break that down. When we talk about AI, there are really two main phases: training and inference. Training is like going to school—it's where the model learns from data. Inference is like using what you learned—it's where the model applies that knowledge to make predictions or generate responses. And here's the key: training happens once, but inference happens millions or billions of times. That's where the real compute demand is, and that's what we're focused on optimizing.

Seth Earley:
That's a great analogy. So why do we need specialized hardware for inference? Can't we just use the same GPUs that we use for training?

Sid Sheth:
You can, and that's what many organizations are doing today. But it's incredibly inefficient. GPUs were originally designed for graphics rendering, and they've been adapted for AI training. They're powerful, but they're also general-purpose, which means they carry a lot of overhead. When you're doing inference—which has very different characteristics than training—you don't need all that overhead. You need low latency, high throughput, and energy efficiency. That's where purpose-built accelerators come in. By designing hardware specifically for inference, we can deliver up to 10x lower latency and significant improvements in energy and cost efficiency.

Seth Earley:
So it's like using the right tool for the job rather than trying to use a Swiss Army knife for everything.

Sid Sheth:
Exactly. And this becomes even more important as AI applications become more interactive and real-time. Think about conversational AI, autonomous systems, recommendation engines—all of these require fast, efficient inference. If your infrastructure can't deliver that, your AI applications won't meet user expectations.

Seth Earley:
Let's talk about the technical architecture a bit. What makes d-Matrix's approach different?

Sid Sheth:
There are a few key innovations. First, we use in-memory compute, which means we're doing calculations right where the data is stored, rather than constantly moving data back and forth between memory and processors. This eliminates a major bottleneck. Second, we use a chiplet architecture, which gives us much more flexibility in how we design and scale our systems. And third, we've completely rethought the memory architecture to optimize for the specific patterns of AI inference workloads.

Seth Earley:
And what does that translate to in terms of real-world performance?

Sid Sheth:
We're seeing up to 10x reduction in latency compared to traditional GPU-based solutions, along with significant improvements in energy efficiency and cost per inference. For enterprise applications where you're running millions of inferences per day, these improvements add up to substantial operational savings and better user experiences.

Seth Earley:
Let's talk about the market. Where is AI compute headed over the next few years?

Sid Sheth:
Right now, the market is very focused on training—building bigger and bigger models. And that's important. But I think over the next five to ten years, we're going to see a major shift toward inference. As more models get deployed into production, as more applications go live, the demand for inference compute is going to dwarf the demand for training compute. And that's where we're going to see the real return on investment from AI. Organizations aren't going to get value from training models—they're going to get value from deploying those models at scale and using them to serve customers, optimize operations, and drive business outcomes.

Seth Earley:
That makes sense. So what should enterprise leaders be thinking about when it comes to their AI infrastructure strategy?

Sid Sheth:
First, don't be conservative. The AI landscape is evolving rapidly, and the infrastructure choices you make today need to be flexible enough to adapt to what's coming tomorrow. Second, embrace heterogeneity. There isn't going to be one size fits all. Different workloads will require different types of compute. Some will need high-throughput batch processing, others will need ultra-low latency for real-time applications. Your infrastructure needs to support this diversity. Third, focus on total cost of ownership, not just upfront costs. Energy consumption, operational complexity, and scalability all matter. And fourth, work with partners who understand the specific demands of AI workloads and can help you optimize your infrastructure.

Seth Earley:
You mentioned low-latency applications. Can you give some examples of where this really matters?

Sid Sheth:
Sure. Think about conversational AI agents that need to respond in real-time to customer queries. Think about autonomous vehicles that need to make split-second decisions. Think about financial trading systems that need to process market data and execute trades in microseconds. Think about healthcare applications that need to analyze medical images or patient data in real-time to support clinical decisions. All of these require not just accurate AI, but fast AI. And that's where inference accelerators become critical.

Seth Earley:
Let's talk about agentic AI, which is getting a lot of attention. How does that fit into this picture?

Sid Sheth:
Agentic AI—where AI systems can take actions autonomously within defined parameters—is incredibly demanding from an infrastructure perspective. These systems need to reason, make decisions, and take actions, often in real-time. That requires very low latency inference, and it also requires the ability to chain together multiple models and reasoning steps. Traditional infrastructure struggles with this. Purpose-built inference accelerators are much better suited to these kinds of workloads.

Seth Earley:
And what about the economic side? How should organizations think about the ROI of investing in specialized AI infrastructure?

Sid Sheth:
It comes down to cost per inference. If you're running millions or billions of inferences per day, even small improvements in cost per inference add up to huge savings. Plus, there are indirect benefits—faster response times lead to better user experiences, which can translate to higher engagement, higher conversion rates, and ultimately higher revenue. And energy efficiency isn't just about cost—it's also about sustainability, which is increasingly important to organizations and their stakeholders.

Seth Earley:
What challenges do you see organizations facing as they try to scale their AI deployments?

Sid Sheth:
The biggest challenges are around infrastructure complexity and cost. Many organizations have built their AI infrastructure using general-purpose GPUs, which work fine for experimentation and small-scale deployments. But as they try to scale—serving more users, running more models, supporting more applications—they hit walls around latency, throughput, and cost. They're spending a fortune on compute, and they're still not getting the performance they need. That's where specialized inference accelerators can make a huge difference.

Seth Earley:
Any final advice for our listeners who are thinking about their AI infrastructure strategy?

Sid Sheth:
Yes. Don't wait for the perfect solution. The AI landscape is evolving too quickly for that. But do be thoughtful about building flexibility into your infrastructure. Make sure you're not locked into a single vendor or a single architecture. Experiment with different approaches. Measure what matters—latency, throughput, cost per inference, energy consumption. And most importantly, focus on the business outcomes you're trying to achieve. The infrastructure is just a means to an end. The end is delivering value to your customers and your business.

Seth Earley:
Excellent advice. Well, Sid, thank you so much for joining us today and sharing your insights.

Sid Sheth:
Thank you, Seth. It's been a pleasure.

Seth Earley:
And thank you to our listeners. You can find Sid on LinkedIn and learn more about d-Matrix at d-matrix.ai. Thanks for tuning in to the Earley AI Podcast, and we'll see you next time!

Meet the Author
Earley Information Science Team

We're passionate about managing data, content, and organizational knowledge. For 25 years, we've supported business outcomes by making information findable, usable, and valuable.