Earley AI Podcast - Episode 80: Redefining AI Energy Efficiency with Brandon Lucia

Written by Earley Information Science Team | Jan 7, 2026 9:18:26 PM

Redefining AI Energy Efficiency: Brandon Lucia on Sustainable Computing Architectures for the Edge and Beyond

Guest: Brandon Lucia, CEO of Efficient Computer

Host: Seth Earley, CEO at Earley Information Science

Published on: January 5, 2026

In this episode, host Seth Earley welcomes Brandon Lucia, CEO of Efficient Computer, for a deep dive into how AI advancements are reshaping the future of computing—particularly with a focus on energy efficiency, sustainable infrastructure, and real-world applications.

Brandon Lucia brings almost 20 years of experience in computer architecture, having served as an academic at Carnegie Mellon University and led significant research at the boundary of hardware and software innovation. He and his team have pioneered a new kind of hardware architecture designed to drastically reduce power consumption for AI workloads without sacrificing performance or versatility. Their work has far-reaching implications for data centers, edge AI, robotics, automotive, and large-scale infrastructure monitoring.

Key Takeaways from this Episode:

AI’s energy demands are accelerating rapidly and require rethinking not just bigger models, but architectural efficiency at every level.
Effective AI infrastructure goes beyond mathematical optimization (like linear algebra); it includes real-world complexity and physical deployment.
Specialized hardware architectures (CPU, GPU) are evolving, but general-purpose solutions with built-in efficiency—like those from Efficient Computer—can unlock new application domains.
Edge computing and “physical AI” (as distinguished from legacy IoT) require extremely efficient processing to enable long device lifetimes and advanced capabilities.
Efficient Computer’s chips offer exponential gains in energy efficiency compared to market-leading CPUs and embedded GPUs—sometimes up to hundreds of times better.
Enterprises should focus on hardware-software co-design and apply principles like Amdahl’s Law: you are limited by what you can’t optimize, so balancing all types of computation is critical.
Fine-grained personalization and retraining of AI at the edge will be increasingly important for future applications.
Organizations that deal in manufacturing, logistics, automotive, infrastructure, or robotics stand to benefit greatly from advances in efficient hardware and architecture.

Insightful Quote from the Show:

"We're not going to meet these energy requirements with the existing hardware and software—we have to change." - Seth Earley

"We are vastly ahead of our competition when it comes to energy consumption. Batteries last longer. You can do more under a power cap. You're not limited by thermal constraints. Those convert directly into capabilities into lifetime. So you can do more than you could do today." - Brandon Lucia

Tune in for a conversation that not only explores the technical side of AI hardware, but also the practical, business, and societal impacts of powering tomorrow’s intelligent systems with greater efficiency.

Links

LinkedIn: https://www.linkedin.com/in/brandon-lucia-0767792/

Website: https://www.efficient.computer/

Ways to Tune In:
Earley AI Podcast: https://www.earley.com/earley-ai-podcast-home
Apple Podcast: https://podcasts.apple.com/podcast/id1586654770
Spotify: https://open.spotify.com/show/5nkcZvVYjHHj6wtBABqLbE?si=73cd5d5fc89f4781
iHeart Radio: https://www.iheart.com/podcast/269-earley-ai-podcast-87108370/
Stitcher: https://www.stitcher.com/show/earley-ai-podcast
Amazon Music: https://music.amazon.com/podcasts/18524b67-09cf-433f-82db-07b6213ad3ba/earley-ai-podcast
Buzzsprout: https://earleyai.buzzsprout.com/

Podcast Transcript: AI Energy Efficiency, Hardware-Software Co-Design, and the Future of Physical AI

Transcript introduction

This transcript captures a conversation between Seth Earley and Brandon Lucia exploring AI's accelerating energy demands, the fundamental principles of efficient computing architecture, and how hardware-software co-design enables breakthrough performance for edge computing and physical AI applications across infrastructure monitoring, robotics, and autonomous systems.

Transcript

Seth Earley: Welcome to the Early AI Podcast. I'm your host, Seth Early, and each episode explores how artificial intelligence is reshaping organizations, how we manage information, how we create value, how improve performance across technology, operations, and decision making. And today, we're going to be focusing on a problem that is becoming impossible to ignore. And that is, as AI accelerates, we are dealing with the problems of energy and energy efficiency. And so, as AI models get more complex, and they grow larger, and compute demands increase, the cost and the power consumption and environmental impact of these systems is rising very, very quickly. So joining me today is Brandon Lucia, CEO of Efficient Computer. Brandon works at the intersection of AI, computing architecture, and energy-efficient systems. His team is focused on delivering high-performance computing while dramatically reducing power consumption, with implications for data centers, edge AI, and sustainable infrastructure. Brandon, welcome to the show!

Brandon Lucia: Hi, Seth, yeah, it's great to be here, thanks for having me today.

Seth Earley: Great. So, before, you know, as we get started, the thing I'd like to do is ask our guests about common misperceptions. So, when you think about, what are the biggest misperceptions you see today about AI when it comes to performance and efficiency, and energy use? Many organizations you know, assume that it's going to mean bigger models, more compute, more energy. How should leaders be rethinking some of those assumptions?

Brandon Lucia: Yeah, it's a great question, and you know, we're in a moment of great transition right now. AI is, you know, coming into its own. Over the last 5 years, things have changed very rapidly. You know that, that's why we're here, I guess, right? So, you know, we see lots of things from our perspective at Efficient. We're focused on energy efficiency, and some of the big things that we see are that AI is bigger than a small few computations. AI is not just, let's optimize linear algebra, and that's the whole story. What we see a lot of times with our customers, when we talk to them, is they have a problem that is partly embedded in the physical world, partly in the hands of a user, and partly on a system running in an edge appliance or a back-end data center. And the huge amount of complexity tied into these real-world AI systems, and the physical presence of these devices, it changes some of the underlying assumptions when you start to think about AI applications that way, rather than just thinking of them as you know, let's squeeze and quantize and do more linear algebra faster. That, of course, is important, but I guess to frame this as a misconception, although I think people are beginning to catch on, this is something that I think, you know, doesn't get as much attention as it should. These applications are huge, far-reaching, physically embedded, and efficiency is really the primary limit on the capability of these kinds of AI… uses of AI.

Seth Earley: Hmm. So, you see, organizations, the hyperscalers building, you know, some of their, energy infrastructure. There's big demands on all of that, and the core assumption is, yeah, we're just gonna have increasing levels of energy infrastructure requirements that are going to be very difficult to keep up with, right? I mean, I know that there's a… projected to be a huge shortfall. And one of the things I notice is when I'm working with large language models is that they will… they will sometimes overcomplicate my questions and my tasks, and I think, wow, what is all this compute power going to. It's not really being used efficiently from my perspective, just from the perspective of the user, so… you know, what are the different levers that you can pull on this? And when you say optimizing linear algebra, not everybody, listening to this will understand exactly what that means relative to large language models. Yes, it's mathematics at the core. It's still hard to comprehend that sometimes when you're interacting with these, the way… the level of nuance and complexity that they get into. But talk a little bit more about that, and then maybe talk a little bit about the levers that can be… dials that can be termed to… to improve efficiency.

Brandon Lucia: Yeah, absolutely. So, I think what the hyperscalers are doing today is… honestly unreal. It's like living sci-fi today, to be able to interact with these models and to see what they can do, the capabilities of these things, even if they are overcomplicating things. You know, think about it this way, that's a sophisticated robot having a conversation with you, and the biggest problem is that it's overcomplicating things. It blows me away.

Seth Earley: I know, every day it astounds me. Every day it astounds me.

Brandon Lucia: It's utterly amazing. So, you know, then we look at what's the infrastructure? The infrastructure is a mix of things. You have CPUs that rely on an architecture from a very long time ago. It was developed in the, you know, 50s, 60s, classic, what's called the von Neumann CPU architecture. That's jargon for the way CPUs have been for a long time. You have the iteration of GPUs. Beyond that, you go from the CPU to the GPU, and you get some additional efficiencies, and now we're beginning to see, especially with this NVIDIA Grok announcement that just happened, you have these move to specialization, and specialization means we are taking specific mathematical operations and subcomputations that make AI go, and building them into hardware. And there's a reason that at the data center level, you have this mix of different hardware components, and the reason is that AI is really… It's a multifaceted thing. It's not just… the math that you can specialize for. It's not just stuff that runs well on a CPU, it's not just stuff that goes on a GPU efficiently and sits somewhere in the middle. It's really all of it. Especially, this is true when you start to push out away from the data center into the physical world, you know? Cars are sophisticated robots. You have, you know, you have bipedal robots that in the next 10 years are going to be helping people in their homes, and they need to be thinking on their feet. Maybe something that goes unappreciated is, it's not just that you have a main brain sitting on top of one of those robots. In every finger joint, you have a little micro-brain that's going to be running some AI to run a control loop and to do sophisticated tactile manipulation, things like that. All of these are facets of the shift to AI as the primary mode of compute, and when… especially when you think about the energy consumption of data centers, this is going to be, you know, a dominant cost of energy globally, over the next 50 years. This is going to be a huge shift. This is a shift for all of humanity, is one way of thinking about it. Another thing to think about is when you push into the physical world, how do we deliver power to those kinds of devices? Often it's batteries. And when it's batteries, that means that every little bit of energy, every joule of energy that you have, really matters. It fundamentally determines the capability of your, you know, mobile system, robotic system, even things in cars often will be driven from battery power, because they're distributed across this thing, and it's difficult to run wiring harnesses, and so forth. So, you have the complexity of AI, you have these different modes of computation. There's an opportunity right now, though. Usually, when things become in this kind of state of complexity, there's an opportunity, and this is a great moment to start to redefine how we take on some of these problems. And that's where, you know, what we're doing at Efficient, we have new architecture, and our architecture is really fundamentally different. It's not like CPU, it's not like GPU, it's not a specialized system. In general, it can do the things that CPUs and GPUs can do, but it also has some of the aspects of a specialized architecture where we can tap into some of the regularity and the structure in those underlying mathematical subcomputations, and that's a sweet spot for us. There's a place for all of these types of hardware out in the world, and we're trying to capture the best of all of them in one architecture, and we've been really successful at doing that thus far.

Seth Earley: I didn't mention your background. Talk a little bit about your background as an academic, and the research that you've been doing. So, you know, this isn't something you just jumped into.

Brandon Lucia: Yeah, yeah, to put it mildly, I guess. Well, almost 20 years ago, I started my academic career. I've been focused on computer architecture, and that's… nothing to do with buildings. That's something that, computer architecture is the boundary line between hardware and software. And that means that we think deeply one day about how you program a machine, and so programming language and formalisms and sometimes even theory. And then on other days, we think of how you take these bits and map them into hardware structures that are implementable by you know, the most sophisticated semiconductor fabs that exist anywhere on Earth. So it's a sort of really looking both ways from the middle way of thinking about the world. That's something that we live and breathe every day at Efficient. We're very computer architecture-centric, in the sense that we look both ways from the middle, and we have a strong software stack and a strong hardware expertise. And that's something that I've made a key point in my entire career. So, you know, some however many years ago now, 11 years ago, I guess, moved to Pittsburgh, started as a professor at Carnegie Mellon University in the electrical and Computer Engineering Department, where, with my co-founders, Nathan Beckman and Graham Gobieski, we developed the kind of fundamental underlying principles behind our architecture at Efficient. So this is stuff that we've been working on for you know, about a decade at this point, looking at what are the right abstractions? What are the right places where we add specialization? What are the right places where we need to preserve generality and programmability? It's been a long road. We found something, now, you know, we've designed, architected, and designed, and implemented a software stack, our architecture, and hardware in our chips at Efficient. And like I said, you know, we've seen great success in developing this into an efficient, new way to support this broader array of AI computations, and then the sort of supporting cast of ancillary computations as well.

Seth Earley: So, tell me more about, the software, and how, how portable your architectures, or how portable software is for your architecture. You know, I'm not a, I'm not a deep, you know, a geek in terms of, computer engineering and software engineering and hardware. So you're going to have to educate me a little bit here, but talk a little bit about how things need to evolve or change in order to use your architecture.

Brandon Lucia: Yeah, yeah, absolutely. So, we've put a lot of attention, like I said, a sort of decade of thinking through abstractions and designing and testing and iterating to get this right. At Efficient, we're focused on providing the efficiency and performance that you would get out of specialized hardware, and doing that with a general purpose architecture. General purpose means you can run anything. I mentioned in the first two minutes of this, I mentioned linear algebra. Of course, you know, I think about that pretty often. That's a specialized computation that's very important, but like I said before, that's not the whole world. And our focus has been on, we want to support general purpose computation. That means we take on the whole world. And so, we have a compiler, we have a software stack, and that software stack, it thinks about software the same way that developers have thought about software for a long time. And also, it has the versatility, and it has the adaptability to think about AI software and AI for the future, meaning the frameworks that are very popular right now for implementing AI models and AI applications, but also the mountains of millions of lines of, you know, C source code out there that drive… it's… it's hard to overstate the importance of supporting all those applications. They drive everything in the world. And so having the ability to support the general purpose world of software, that allows for the maximum of innovation. We support the future, AI and what comes after the current moment in AI and the frameworks and software that support it, but also the ability to iterate rapidly and to build applications that rely on other pieces of software that aren't squarely in the box of one of the AI frameworks or application suites.

Seth Earley: So tell me, who are your target customers, and how do they… how do they engage? How do they leverage what you're doing? Is it a matter of… you're saying they don't have to throw away what they have, right? No one's gonna do that. But is it a matter of you're selling to a, you know, cloud provider, and customers will use it that way, or are you selling to end users? Tell me a little bit more about where you fit in in the ecosystem, and how an organization can leverage what you're doing, and then what's the business case for them doing that?

Brandon Lucia: Yeah, yeah. Well, we started at the edge. That's where we found there was huge opportunity to have impact, taking AI applications and the sort of signal processing and data massaging and formatting and sampling and regularization and tokenization, all those workloads running at the edge consumed a huge amount of energy. The way that people have solved that in the past is very unsatisfying. It's… take your bits, throw them over a radio, burn up your battery in a week or a month or whatever, and okay, too bad for you, now your application can't live out in the wild for a long time. We add the efficiency so you can do all those computations locally, on device. So, you asked, how do you get started? The way to get started is take your code, and today you might use a compiler, GCC is a popular compiler, Clang is a popular compiler. You just use our compiler, it's called FCC, F for efficiency, FF. And so FCC, you run that on your code, and it works the same as those other ones that I just mentioned. You don't have to change anything. Our compiler figures out how to map your code into our hardware, and then it runs. And you get a huge boost in efficiency because we have better abstractions than these legacy ways of doing computation on CPUs and GPUs. So, we look at applications, like I said, at the edge, and so you have infrastructure, observability, that's a huge area that's growing right now.

Seth Earley: Talk more about that, so talk more about that. There's distributed centers, IoT, really being able to process at the source of where this data is coming from, but talk more about that.

Brandon Lucia: Yeah, absolutely. So I like to think that IoT came and went with smart light switches, and what we're looking at now is a sort of rebirth of something that's fundamentally new, it's fundamentally different, and the value proposition is much, much higher. So, there are millions of kilometers of critical infrastructure. This is things like roads, water pipelines, power lines, the stuff that you see around you that makes society work. And you need to have eyes on that all the time. Today, you know, in the past, the solution is to hire, like, some guy in a Cessna to fly along the ridgeline and see where the power lines are located. That's not a joke, that's actually how it's done.

Seth Earley: Hmm.

Brandon Lucia: The future will be, and it's beginning to happen today, with some of our customers, we're actually fielding devices that are built to support this future. The future will be millions of devices that are conducting a search continuously up and down the infrastructure for anything out of the ordinary. They're gonna be running signal processing, data collection, sensor fusion, AI at the edge, understanding what's happening with my infrastructure. Is this pole gonna fall down? Is there an anomaly in the gas flow in this line? And we need to report that immediately, or that looked like an anomaly, doesn't matter, that's okay. You need to do that at the edge. There's so much data, you can't channel it all back over to radio.

Seth Earley: Right, right. So, you mentioned IoT came and went with smart light switches. Tell me more about that. Is that… is it a term that's no longer used, or is it… tell me more about what you're saying is. How is this different? How is it not IoT, and is that just a deprecated term?

Brandon Lucia: I'm flagrantly opinionated about the subject.

Seth Earley: Okay, go ahead, give us your opinion.

Brandon Lucia: Yeah, yeah, when it comes to computation, the first wave IoT applications didn't really have much in the way of smarts or AI or intelligence of any form. A lot of them were just a sort of conduit to the cloud.

Seth Earley: Right.

Brandon Lucia: So you put something on your light switch, or you put something on your door, and it goes beep-beep, and that's because it sent a message to the cloud, and that's not very interesting. It's more about connectivity and having, you know, kind of… it's the era of the dumb terminal compared to the era of the data center. What we're doing now is moving to devices that have very sophisticated intelligence directly on the device, and that's… that's an area of applications that efficient is built for. We are outrageously better than what's on the market today at these kinds of applications, where you bring together signal processing, you bring together these sensor fusion algorithms, you can do the intelligence right on the device. That's really fundamentally different from, you know, your light switch goes beep-beep when you hit it, something like that.

Seth Earley: Is the term… is there a new term to describe that, as opposed to IoT? Again, I'm just trying to get the, clear… Yeah, yeah.

Brandon Lucia: It's not my term, but it's one that I've been hearing quite a bit, and I'm beginning to subscribe to it a bit. It's the wave of physical AI applications, and it covers a broad stripe of things. You have, on one end of the spectrum, you have… tiny infrastructure sensors that are intended to disappear into the environment. And on the other end of the spectrum, you have large, sophisticated robots that are doing complicated jobs that put human lives at risk today, and maybe tomorrow we don't have to do that. We don't have to put those lives at risk to those kinds of jobs. And so, and everything in between, you know, monitoring undersea cables is a very exciting one of these. There's AI across the board, and… when I say AI here, I mean, again, like I said, that broader definition, where we include not just a subset of the computations that support, like, a few kernels that drive the whole world of AI, but also the supporting cast. You have the signal processing, sensor fusion, and general purpose analytics that vary from one use case to the next. That's really important, and that's what we're good at at Efficient, is having a software stack and architecture to support the full complement of computation that you need to drive especially these sophisticated physical AI applications for today and for the future.

Seth Earley: Gotcha, so physical AI. Now, you said you're outrageously better. Tell me, give me some benchmarks, tell me what that means in practical terms, and tell me what that could mean for, you know, an enterprise, right? So I can imagine that there are, you know, that there are developers out there and product companies that are, you know, at the leading edge, doing innovation, but it really… any organization that deals with the physical world, right? Manufacturing plants, distribution, logistics, really, you know, supply chain, there's a lot of places where you can have distributed computing and physical AI. But talk a little bit about, again, what those benchmarks are, what that improvement is, and then how it applies to the various tiers in this ecosystem.

Brandon Lucia: Yeah, absolutely. So, when I say that we're, you know, outrageously better, sorry for the hyperbole here, but…

Seth Earley: That's all right, you can be hyperbolic, as long as you can bring the receipts, as they say.

Brandon Lucia: Yeah, let me show you the receipts. So, we benchmark against a lot of in-market parts today, and I'm not going to mention specific comparisons. Some of those are available on the internet. We've made some data public, and we can share that with you. We look at the most energy-efficient CPU and embedded vector machine, and a vector machine is basically… it's like a GPU that's been stripped down to work at the far reaches of the edge. We benchmark against those kinds of systems, and on a bad day, we see five times improvement in energy consumption, and in the limit, in the peak, we see hundreds of times improvements in energy efficiency. And that's not notional, modeled, when you take away the power consumed by the board. I mean, I can show you this, I don't know if this will come through very clearly on my camera, but this, this right here is our board, that's our chip in the center. We run all our experiments on this chip… on that chip right there, and we compare to parts that are in the market today, and we… we do head-to-head on AI, on signal processing, on encryption, on compression, you know, all the greatest hits, computation through the ages, and things that are on the bleeding, absolute bleeding edge of tomorrow, including AI frameworks, code written in C, C++, we have, you know, PyTorch, TensorFlow, Onnx, Rust, all of those things. And the energy is not even close. We are vastly ahead of our competition when it comes to energy consumption, and that's the primary determinant of the value of these applications. Batteries last longer. You can do more under a power cap. You're not limited by thermal constraints. Those convert directly into capabilities into lifetime. So you can do more than you could do today.

Seth Earley: Incredible. And so there's the implication. So, who are your biggest customers today? So, are you selling into the data centers and the hyperscalers? Are you selling into, you know, robotics organizations? What's the market landscape look like for you?

Brandon Lucia: Yeah, yeah, so I'm not going to talk about specific customer relationships, but I'll give you the broad strokes, and I think you'll get the idea here. So we're looking…

Seth Earley: Classes of customer, as opposed to, yeah.

Brandon Lucia: Yeah, absolutely. So we're looking deeply, like I said, into infrastructure observability. We have several relationships in that area, partnerships that are… they're going to bring significant efficiency to end operators of these types of infrastructure. Hundreds of millions, billions saved in outages and loss because of missed events and lack of observability. Huge, huge improvement, and it owes to the fact that we can now deploy with our channel partners, we can now deploy these applications into smart devices. They're running AI, they're running DSP, and they can do that rather than for a few weeks or a month or something, and then you gotta go replace the battery, rather than that, we deploy devices with our channel partners for 5 years. Now the device can be running these sophisticated applications for that period of time. We also see a lot of activity in… the area of automotive and robotics. There's a lot of interest we've been seeing in automotive use cases, where you have thermal limits, and where you have a difficulty, especially in some of the, you know, far reaches and inner workings of a vehicle to run power. It's difficult to, you know, every time you need to put a new wire somewhere in a car, things get a little complicated, right? So we have… we have some interest in that area and in the area of robotics. And those are… those are some of the key areas that we're focused on right now. We have a lot of traction in those areas as well.

Seth Earley: Hmm, interesting. How large an organization are you these days? Whatever you can share in terms of either run rate, or, employees, or…

Brandon Lucia: Yeah, yeah, so we're, we're 50 people, and headquartered in Pittsburgh, and we're roughly 50-50 between Pittsburgh and San Jose, and we're growing now. And it's a really, you know, wonderful moment of growth, where we have, chips. We're going to be at CES this week, actually, and we're going to be showing off our E1 evaluation kit. This is the public release of that. We're very excited to be bringing this to the world. We're going to be opening up a cloud evaluation platform that allows… potential future channel partners to get involved, to join our group of early access partners, and to get their hands on our hardware, and to be able to run some software, and really see the efficiency at work, really see what they can do with this incredible efficiency that our architecture provides. So we're at a really amazing moment of growth right now. It's a huge step forward, and looking further into the future, you know, second half of next year, we're going to be bringing this product into the market for general availability, and we're really excited to see that hit the market.

Seth Earley: That's awesome, and tell me about the evaluation kit. What does that consist of? That's a way for people to look at your… using your application, your hardware and your software in their application scenario, and tell me more about how that works.

Brandon Lucia: Yeah, so like I said, when it comes to our software stack, we like to meet the programmer where they are. We want you to have your software as you've been using it today, and we don't want you to have to change very much or anything. So our evaluation kit is designed around that principle. Plug in a USB-C or log in, we have a cloud environment, whichever way you end up accessing it, plug into the USB-C, and run our compiler. If you have a collection of, you know, say, C and C++ source files, you just build those. We have a simple command you use, you can push the code onto the chip, it'll run. Our evaluation kit is… it's fairly full-featured, so we have a full complement of I/O, you have general purpose I/O, and some special purpose I/O interfaces that are very common at the edge. And then in addition to that, we've also built in some energy profiling infrastructure, so if you want to go and run your code, that's great, you can see that it works. Of course, it had better work, right? You can also see, this is how much less energy it takes than when you have that other board over there. We want to make it easy for people to see the value that they get from using our architecture.

Seth Earley: And then extrapolate and project what that cost savings would be for full deployment, and imagine, as you say, there's some very, very powerful use cases and ROI value propositions on this.

Brandon Lucia: Especially when you get to really large fleet sizes and really large deployments, the key word is scale here. So, if you are building out an observability application, like the one I was talking about before, if your battery lasts a month and you want to instrument millions of kilometers, think about the fleet of vehicles you have to have driving around to go and switch batteries. Increase the lifetime to 5 years. Now, the rate of replacement to batteries is much lower, and so you increase the capability across the entire fleet, you enable scale in these applications, and like I said before, you don't have to spend all of the dividends of this energy advantage on just increasing the lifetime to 5 years. Say you need 2 years, but you need more sophisticated capability for your device, boom. Spend some of your energy dividends on that, and now you can level up the AI that you're doing on your device. You can level up all signal processing and analytics. And you get something really special that you just… you can't do that with what's out there today. We are unique in our ability to support all those computations and to do it extremely efficiently.

Seth Earley: How much funding, is that something you can share? Have you had over these years to get to this point?

Brandon Lucia: So… I can't talk in specific about funding, that's not a topic I can cover today, but I can share that we're at a really exciting moment of growth right now, and we're at a place where we're going to be scaling through next year and bringing a product to market for general availability at volume, and we're really excited to be able to do that, and we're welcome… just to be able to do that.

Seth Earley: Yeah, I was just curious about what it's taken to get there, because designing hardware and chips and microprocessors and all of that is not a trivial exercise.

Brandon Lucia: I can tell you it's not cheap.

Seth Earley: Yeah, I hope so.

Brandon Lucia: You know, the payoff is huge. The investment here, the capital investment, we're extremely capital efficient in the way that we run things, and the payoff of our, you know, focus and capital efficiency and putting the proceeds of our previous raises to work, that is being able to bring a product in, you know, just a few short years since we… since we incorporated it, began to exist, bringing a product to market at volume general availability, and we're really proud to have been able to accomplish that in such an efficient way. It's really part of the spirit of my team here. We love the concept of efficiency. The org, it's a cultural touchstone for all of us, and so we're all very excited to see the fruit of our hard work next year.

Seth Earley: Tremendously exciting, and what's the future hold for you? Are you looking at a timeline to go public, to exit, to… what do you think is going to happen over the next 3… or 5 years. I know it's very difficult to make predictions, especially about the future, as Yogi Berra used to say, but… and 3 years in AI terms is an eternity, but what do you see as the path?

Brandon Lucia: I like to think about the future, and one of the reasons for that is, you know, you said it well. In AI terms, 3 years is 20 years. Somehow that's what happened, right? So, a couple of things. One is, we are… in terms of technology, we are very well positioned for a future-proof run into the next 1, 3, 5 years, because we're not tied, we're not specialized to a particular type of architecture… type of computation. Our architectural is gen… our architecture is general. This is very important for staying current, being able to adapt as the world of software adapts, and also to enable innovation. We're not tying developers of AI applications to a particular way of doing things. So, when I think about the future, I think of, wow, how our architecture can go from where it is today to scale. We think about scaling up. We're looking at what are the ways that we can take our architecture and bring to bear even more computationally powerful, computationally sophisticated applications? And to do that with the same or better energy efficiency than we're already delivering today, you know, compared to other parts of the market. And that's something where over the next 3 to 5 years, we're going to be looking very carefully at that, and how we can take the architecture that we've built and our software stack, which allows us to remain general and allows us to support innovation, and we can turn that into something that scales up and scales out. And, you know, we have eyes across the entire stack. We're looking all the way up and down. Today, we're at the edge, that's the most important area that we're focused on right now, is going to be delivering energy efficiency better than anything in the market for a broad array of applications at the edge. The future is very exciting to think about, technology and the new markets that scaling up and scaling out unlocks.

Seth Earley: Yeah, that's great. And so, you know, a general, purpose, tool will always have some trade-offs, right? Because, again, specialization will give you certain things, and… and so, are you saying that your architecture and your chip is… will be as valid and useful for training models as it would be for inference? Because those, again, are different computationally, and different, in terms of energy consumption, and so on. So, what are your, what are your thoughts there?

Brandon Lucia: Yeah, so we haven't looked at the problem of large-scale training in the data center. That's something where the architecture that seems to be the winner there for now, for the moment, is the GPU, and there's technical reasons for that, the way that you can parallelize and batch computation, that's the reason that that's been the winner. Now, that doesn't mean that our architecture isn't applicable to that, and as we look to the future, we may find that that sort of computation is our sweet spot. What I can say, though, is if you look at devices at the edge, today, you have… very limited ability or inability to do… fine-grained personalization, adaptive retraining on device, and that's something that we can really unlock. So if you have a specialized piece of hardware that's focused on just inference, you can't really do any of that retraining work, or at least you can't do it efficiently. The architecture we have, we open up the ability to do that kind of additional fine-grained tuning for a use case and environment. And so that's… that's something that we think about now, today, when we look at the edge, when we look at edge applications, there's a huge opportunity here. It's not tapped into by, say, something that's ultra-specialized for inference. If you look at instruments, that's a very specific subset of computations, and it's just, it's just not the same set of things that you would need to do there, you know, on-device personalization. And not to mention all of the other types of computation that you need, of course, to make… well, you know, you need to ingest data, and you need to make sense of the data before you can put it into an AI model to begin with, really. So, you know, when I think of, when I think of the difference between, you know, inference and training, I think that is even confined to a subset of the use cases and computations in the world, and the AI that I think of is even much larger than that, especially when you bring it in the physical world, physical AI.

Seth Earley: It's interesting when you talk about the idea of being able to retrain or fine-tune, you know, in the same infrastructure in which you're doing inference, right? Because inference is going to give you data, and it's going to give you those edge conditions, and it's going to you know, evolve, you know, how the interaction with the environment is being shaped, and then, yes, that information should be able to be ingested back into training of the model, or retraining of the model, or fine-tuning of the model. So I can see how that would be very interesting in terms of the ability to do both of those very different functions at the edge, so I can see that's really a great opportunity. Where would you say, you know, most organizations are kind of overlooking opportunities. You know, in your typical, say, enterprise, are there things that they should be doing? Again, if they… maybe they are or aren't doing things at the edge, but, but what about, the typical enterprise? Where are they missing opportunities for efficiency, and, and improved, investment return.

Brandon Lucia: Yeah, so one of the areas where, and this is… this is something that we keyed in on early on, this is very important to us as an organization, and sort of culturally, this is something we live every day. We are focused on hardware-software co-design. This is… the key to our technology is understanding both sides of the interface. You know, I said at the top of the hour here, this… this job of being a computer architect. It's… you're in the middle, and you're looking up and down. You're looking at the software side, and you're looking down to the abstractions and hardware structures and, you know, even all the way down to silicon. This is absolutely… critical for the success of bringing a new hardware architecture into the world, and making it usable, and making it useful, and actually providing an advantage. So the specific ways in which you get an advantage, one is you define the abstractions in software so that, on one side, they support the set of use cases and the set of operations that you need. That could be, in some world, just a subset of AI applications, although that's a miss, because now you're leaving all this other computation out in the cold. What if things change?

Seth Earley: Hmm.

Brandon Lucia: And so, you can design, and we've done this, you can design a general-purpose set of abstractions that capture the entire world of software, including those AI applications, those AI kernels. And then… with hardware-software co-design, we can plumb those abstractions through the architecture into the hardware. That's really the key to our advantage, is we've done hardware-software co-design, and it allows us to take advantage of the structure that is intrinsic to all computation. There is structure in computation. We tap into that. We identify the structure using our software, using our compiler, using our SDK, and then we can take the structure and we can map it into extremely energy-efficient hardware structures that mirror those common structures and software, but we do it without over-specializing. We're not dysfunctional, and that's really the key to our magic. That's really the key to what we've done.

Seth Earley: Hmm. So what should leaders of the enterprise look to, and what should they prioritize? What should they be thinking about for the next few years when it comes to this? I mean, you know, a lot of people are just kind of taking for granted, well, the cost is the cost, but it sounds like there are clear steps that they could be taking, so what recommendations, what advice do you have for leaders today in terms of what they should focus on, what they should prioritize on, and how they should think about this area?

Brandon Lucia: It's a great question. I think that the best thing to do is start with the basics, and in computer architecture, there's a rule, and it's one of the most fundamental principles in computer architecture. It's one of the basics. It's called Amdahl's Law. It's named for Gene Amdahl. You can look him up on Wikipedia. Old guard computer architect, super famous guy, invented everything, etc. Amdahl's Law says that you are bottlenecked by the part that you cannot optimize.

Seth Earley: Hmm.

Brandon Lucia: And so if you have… if you have an application, right, and say it's, you know, some… I'll oversimplify just to make the point here. Say it's an application that does a whole bunch of sophisticated PyTorch AI stuff. And it does a bunch of other, you know, signal processing, compression, etc. All the other parts of the software that you now have to run, and they're sort of sitting there next to that. Well, if you look at that, and say it's 50-50, it's easy to think about if it's 50-50, and let's say that we made… you and I sit here, and we designed the most amazing special purpose accelerator that has ever existed, and it makes that AI part, the PyTorch part, that goes zero seconds of time and zero joules of energy, and it's the most amazing hot rod machine that can't really exist, right? What we're left with is the rest of it.

Seth Earley: Hmm.

Brandon Lucia: And the rest of it becomes the sandbag you have to carry around if you don't pay attention to Amdahl's Law. Amdahl's Law is one of the most important guiding principles for this moment in AI. We need to look, where is our energy and time going? In the data center, in the high-performance edge, like, think, you know, high-end robots and cars, and all the way down to the tiniest devices that we're fielding into things like infrastructure observability. Is it the radio? Is it tokenization? Is it doing statistical sampling because we have to do some complicated procedure, and that's going to eat up way more energy than it should? Is it that we've really nailed it, and 99.99999% of energy goes into the AI kernel computation? I can tell you that's not the case in today's data centers, because in data centers today, you have x86 sitting right next to all your AI accelerators 24-7, churning away. 64-core x86, 64-core ARM processors, those eat up a lot of energy. There's real Amdahl's opportunity there. And so I think that the advice that I would give, and, you know, I almost want to apologize for being a bit too academic about this, but focus on the fundamentals and look at Amdahl's Law. It's the most important principle to look at now. If I were to go to NVIDIA and say this, they would say, duh, of course, that's what we've been thinking about. I know people there that are doing that, so, you know, maybe it's preaching to the choir to say that Amdahl's Law is important, but really, I think that that's… it's a real opportunity to tap into, you know, a huge win that could be easily overlooked by someone not thinking about the Amdahl's opportunity.

Seth Earley: That's great. Well, listen, this has been tremendous. Thank you for the great conversation. Thanks for sharing your insights on AI and computing architecture, on energy efficiency, and all of these things that are so important and so critical, and I can imagine that, you know, this is a forcing function, right? We're not going to be able to meet these energy requirements with the existing hardware and software and infrastructure that we have. So, these things have to change, and so it's great to… to see that this kind of progress is being made, so that's wonderful, and I really appreciate your time today, so thank you. And thank you to our listeners for tuning in. Join us next time as we continue to explore how AI is shaping the future of technology, of business, and intelligent systems. So we will see you next time, and again, thank you, Brandon, for your, Brandon, sorry, for your time today.

Brandon Lucia: Yeah, thank you very much for having me, I appreciate it.

Seth Earley: Okay, we'll let you go now, and we'll see you all next time.

View full post