Earley AI Podcast - Episode 89: Memory, Power, and the Hidden Constraints of AI Infrastructure with Steven Woo

Written by Earley Information Science Team | May 6, 2026 2:42:55 PM

Why Memory Is the Overlooked Bottleneck Shaping the Future of AI - and What Business Leaders Need to Know

Guest: Steven Woo, Fellow and Distinguished Inventor at Rambus

Host: Seth Earley, CEO at Earley Information Science

Published on: May 5,2026

In this episode, Seth Earley speaks with Steven Woo, Fellow and Distinguished Inventor at Rambus, where he has spent over 30 years at the frontier of memory technology. They explore why memory - not compute - is the binding constraint on AI performance today, how moving data between chips consumes more than half of all power in a high-end AI processor, and what the rise of agentic AI means for infrastructure planning.

Steven shares a rare long-view perspective on the innovation curve for memory technology, the supply-demand dynamics driving prices higher, and the questions enterprise leaders should be asking before signing their next infrastructure contract.

Key Takeaways:

Memory, not compute, is the primary bottleneck limiting AI performance - and the gap between processor speed and memory speed is widening, not closing.
Over 50 percent of the power consumed by high-end AI processors is spent simply moving data on and off the chip, not performing computation.
Stacking memory components closer together can reduce energy costs dramatically but introduces new challenges around heat dissipation and power delivery.
Training and inference have very different memory profiles - understanding both is essential for organizations architecting AI infrastructure at scale.
Agentic AI compounds the memory challenge significantly, because one user can spin up multiple agents that each spawn further agents, multiplying context and capacity demands.
Memory prices have risen sharply due to supply-demand imbalance - organizations are now signing long-term supply agreements to lock in capacity, just as they do for power.
The most important question enterprise leaders can ask their infrastructure providers is how much experience and demonstrated reliability they have - downtime during model training can be catastrophic.

Insightful Quotes:

"Memory has become a big bottleneck. In many cases, in AI, your speed at which you can actually process information and create new large language models is really gated by the speed and availability of memory." - Steven Woo

"More than 50 percent of the power is spent in circuits just trying to move data on and off the processor. It's pretty astounding to think that as companies plan how much power they need, a lot of it is really related to simply moving data back and forth." - Steven Woo

"People think of compute in terms of gigawatts. But it turns out it's really the movement of that data - and nobody talks about that. It's the silhouette behind the curtain that's actually constraining everything else." - Seth Earley

Tune in to discover why the future of AI depends as much on memory engineering as it does on model development - and what enterprise leaders need to understand about the infrastructure constraints shaping every AI investment they make.

Links

LinkedIn: https://www.linkedin.com/in/stevencwoo/

Website: https://www.rambus.com

Ways to Tune In:

Earley AI Podcast: https://www.earley.com/earley-ai-podcast-home Apple Podcast: https://podcasts.apple.com/podcast/id1586654770 Spotify: https://open.spotify.com/show/5nkcZvVYjHHj6wtBABqLbEiHeart Radio: https://www.iheart.com/podcast/269-earley-ai-podcast-87108370/ Stitcher: https://www.stitcher.com/show/earley-ai-podcast Amazon Music: https://music.amazon.com/podcasts/18524b67-09cf-433f-82db-07b6213ad3ba/earley-ai-podcast Buzzsprout: https://earleyai.buzzsprout.com/

Podcast Transcript: Memory, Power, and the Hidden Constraints of AI Infrastructure

Transcript introduction

This transcript captures a conversation between Seth Earley and Steven Woo about the memory bottleneck that sits at the heart of every AI system - largely invisible to business leaders but increasingly consequential for anyone making infrastructure investments. They discuss the physics of moving data, the difference between training and inference requirements, how agentic AI is about to multiply memory demand in ways the industry is still working to understand, and why reliability - not just speed - is what separates commodity memory from mission-critical infrastructure.

Transcript

Seth Earley: Well, welcome to today's Earley AI Podcast. I'm your host, Seth Earley, and in each episode we explore how artificial intelligence and data are shaping business strategy and operations. Today, we're going to look under the hood of AI to examine one of the most overlooked constraints on AI advancement - and that is memory.

Joining me today is Steven Woo, Fellow and Distinguished Inventor at Rambus. Steven has been with Rambus for over 30 years, and began his career studying neural networks. He brings a rare long-view perspective on how memory technology shapes the limits of what AI can do. Steven, welcome to the show.

Steven Woo: Thank you very much, Seth. I'm really happy to be here.

Seth Earley: Let's start with common misconceptions. What are the biggest misconceptions that business and technology leaders have about AI infrastructure, and the role that memory plays in it?

Steven Woo: For decades, we've watched the industry make incredible advances in how fast processors can run and how fast they can do their computations. But the thing going on behind the scenes is that memory has to keep up. These vast processing engines need data, and they need it at higher and higher speeds in order to process more quickly.

Probably the most interesting thing we're seeing right now - and AI is exacerbating this - is that memory has become a big bottleneck. In many cases, in AI, your speed at which you can actually process information and create new large language models is really gated by the speed and availability of memory. Over the last few decades, the speed at which processors have been improving has been faster than the speed at which memory has been improving. That gap has been gradually producing this bottleneck, and we see it more than ever today. It is getting worse going into the future.

Seth Earley: So when people think about AI capacity, they think about compute power - but you're saying compute power is constrained by memory. How should that change how organizations think about their AI investments?

Steven Woo: What is really interesting now is that more companies are understanding they are limited by the amount of memory they can get a hold of. Large infrastructure players are signing long-term deals to lock up supply to make sure it will be there when they need it. The same is true of power - there are long-term deals being signed for semiconductor manufacturing capacity as well. People are looking much further out than I am used to seeing in the industry.

Seth Earley: You mentioned that moving data between chips consumes more than half the power in an AI system - not the actual compute. Can you unpack that?

Steven Woo: Fundamentally, what you need to do is move data to your processor and move the results back from it. It turns out that the amount of energy spent is related to the distance the data has to travel. It may not seem like much - we are talking on the order of maybe 10 to 20 millimeters. But if you look at all the power spent on high-end processors, more than 50 percent is spent in circuits just trying to move data on and off the processor. When you start going off the chip - 10, 20 millimeters, or in some cases meters between chassis - that is where you spend all your energy. It is pretty astounding to think that as these companies are planning how much power they need, a lot of it is really just related to moving data back and forth.

Seth Earley: And is that ultimately a physics problem? What is the industry doing to address it?

Steven Woo: It is exactly a physics problem - and if it is a physics problem, the answer is distance. People are putting components closer together. What people are talking about now is stacking components on top of each other, which cuts down the distances by one to two orders of magnitude.

But you hit the nail on the head when you ask about trade-offs. If you stack components, you are now dissipating all your power in a smaller volume, which makes it much harder to cool. And you also make it harder to supply power to everything that is stacked. So there are two significant engineering challenges there, and there is a lot of active work in the industry on exactly those problems.

For longer distances - going between chassis in a rack, or between racks in a data center - optics has long been viewed as the right technology. It is incredible in terms of data rate and signal quality over distance. But integrating photonics directly into silicon packages at the shorter distances now being demanded is a real enabling challenge. A lot of work is going on right now to move those interconnects to all-optical.

Seth Earley: You described the concept of memory tiering - different data living at different levels of proximity based on how frequently it is needed. What are the trade-offs technology leaders need to think about when provisioning AI infrastructure?

Steven Woo: A good analogy is storage in your house. You have closets and drawers that are readily accessible. But that is not enough for most people, so you put some things in the garage - and you try to go there as little as possible. And if you have a lot of stuff, you might rent self-storage a couple of miles away, which you really do not want to visit very often.

The trade-off in memory tiering is really about how recently and frequently something has been used. You want that data close by. Things you use less frequently can be a little further away.

If you think about a long conversation with an LLM, you are building up a context that cannot all fit directly in the memory attached to the processor. Some of it has to be kept close, some of it a little further away. Expanding context windows is something everyone wants - but it is one of the things driving the industry's emphasis on improving capacity, bandwidth, and energy efficiency, because the larger the context, the harder and harder the storage and retrieval problem becomes.

Seth Earley: Who owns this problem? Is it the data engineers, the hardware engineers, the people procuring compute capacity?

Steven Woo: It is really a problem for everybody. If you are a hyperscaler, you are both procuring the hardware and writing the algorithms that move the data. You think about it as a whole system problem - what is my hardware capable of doing, and how can I change my algorithms to get better locality and less movement of data? We work with many different companies that consume these technologies to try to understand what they need going forward. There are a lot of brilliant software people working very hard to figure out how to get the most out of what they have been given today.

Seth Earley: What is the difference in memory requirements between training a model and running inference?

Steven Woo: Training has become its own fascinating area. People exceeded the capacity of a single GPU to train large models years ago, so now it is all about ganging together tens of thousands of GPUs with enormous aggregate memory and high-speed interconnects. For the largest LLMs, cost is essentially no object - they will spend pretty much anything to get the compute performance.

Inference is very different. It cuts across many markets - the data center, home computers, mobile phones. For each environment, you look at what the hardware is capable of and you pare down the model to perform well on the device you actually have. Sometimes that means the device has to reach out to a data center for resources it does not have locally. The memory types are also completely different: phones use low-power DDR optimized for battery, home computers use standard DDR, and at the high end of the data center you have high bandwidth memory - HBM - which offers the highest capacity and highest bandwidth of anything available, but at a significantly higher price and greater design complexity.

Seth Earley: How is agentic AI changing the infrastructure picture?

Steven Woo: Agentic AI is making capacity planning really hard. There is no doubt it improves productivity - but the vision is to have a human interact with one or more agents, and those agents interact with more agents. As the number of agents and the number of contexts grows, what a single user can be consuming is much, much higher than before. Questions about how to make those contexts long-lived, how agents should communicate back and forth, and what exactly they are going to require are all active areas right now. We are still in the early stages of understanding how agentic AI will impact infrastructure at scale.

Seth Earley: Memory prices have risen sharply. How should organizations be thinking about procurement and planning?

Steven Woo: Up until about seven to ten years ago, the memory industry gave pretty reliable improvements in capacity and performance at a roughly constant price. For DDR memory, about every five years you could count on doubling the bandwidth and two to four times the capacity, at the same cost. That was a great deal.

What has changed is the pace. With AI driving high-performance memories like HBM, we are now doubling bandwidth and capacity roughly every two years - much faster than before. That accelerated cadence means developing manufacturing capability much faster, and the costs are harder to recover on a shorter timeframe. We have seen memory prices rise pretty dramatically. Memory I bought for my home PC over the summer is three times the price at online retailers today.

The supply-demand imbalance is causing organizations to sign long-term commitments to lock in prices so they can at least plan. It is a dynamic that has not really existed in this industry before, and it is forcing a different kind of strategic thinking around procurement.

Seth Earley: What questions should enterprise leaders be asking when they are evaluating AI infrastructure?

Steven Woo: The most important question is how much experience and demonstrated reliability the companies you are working with actually have in producing these technologies. When Meta was training Llama 3, they wrote a paper about some of the challenges - and when you work through the numbers, their infrastructure was going down roughly every three hours. That caused them to take extraordinary measures to recover from errors.

The companies you want to work with are the ones that have spent decades understanding reliability in these environments. A large fraction of those kinds of outages are related to memory. Downtime is a killer - it will ruin both your productivity and your reputation. Asking the right questions about what is inside the box, and what track record a company has, can reduce a lot of headaches before they happen.

Seth Earley: Steven, thank you so much for joining us. This has been a genuinely eye-opening conversation about infrastructure that most people do not think nearly enough about.

Steven Woo: I enjoyed our conversation today, Seth. Thank you very much.

Seth Earley: And thank you to our audience. We appreciate you tuning in, and we will see you next time at the next Earley AI Podcast.

View full post