LIVE: Google’s Jeff Dean on the Coming Era of Virtual Engineers

Training Data: Ep45

At AI Ascent 2025, Jeff Dean makes a bold prediction: we will have AI systems operating at the level of junior engineers within a year. Discover how the pioneer behind Google’s TPUs and foundational AI research sees the technology evolving, from specialized hardware to more organic, brain-inspired systems.

Listen Now

Stream On

Summary

Jeff Dean, who joined Google in 1999 and now serves as Chief Scientist at Alphabet, was instrumental in developing many of the foundational technologies that power today’s AI revolution, including TPUs and key machine learning techniques. Jeff emphasizes the critical interplay between algorithmic innovation and infrastructure scaling while highlighting emerging opportunities in multimodality, agents and specialized computing.

Large models and data remain foundational but require strategic deployment: While the largest, most capable models will be limited to a handful of well-resourced players, techniques like model distillation enable startups to create lightweight, specialized models that maintain strong performance. The key is identifying specific use cases where smaller, focused models can deliver outsized value.

Infrastructure and hardware optimization are becoming critical differentiators: The shift toward AI-centric computing demands rethinking traditional approaches—from hardware architecture to algorithmic design. Success requires deeply understanding compute efficiency, memory bandwidth and data movement costs. Founders should consider how to optimize their specific AI workloads across training and inference.

Multimodality and agents represent major growth vectors: The ability to seamlessly work across text, code, audio, video and images is increasingly valuable. While current agent capabilities are limited, there’s a clear path to rapid improvement through reinforcement learning and use of simulated environments. Early movers in targeted agent applications can capture significant value.

Education and productivity tools offer immediate opportunities: AI’s ability to create interactive educational experiences and augment human capabilities in work environments is already showing promise. Founders should look for specific workflows where AI can dramatically improve efficiency or enable entirely new approaches to learning and task completion.

The development environment is evolving toward more organic, flexible systems: Future AI systems will likely feature varying levels of compute intensity, specialized components and the ability to continuously learn and adapt. Successful founders will need to balance leveraging current effective approaches while preparing for this more dynamic future.

Transcript

Chapters

Where is AI going?
Are agents vaporware?
How important is specialized hardware?
Pathways for cloud customers
The future of computing infrastructure
Do you vibe code?
What’s the end game?
How far are we from an AI junior engineer?

Pat Grady: We have Jeff Dean. And if you read Jeff’s bio, he’s run everything at some point in Google, including overseeing the genesis of this industry and the BERT paper that kind of sparked things so many years ago. And we’re very fortunate at Sequoia to have our partner, Bill Coughran, who spent about a decade before Sequoia running most of engineering at Google with Jeff. And so please welcome Jeff and Bill.

[applause]

Bill Coughran: Thank you. And Jeff, it’s great to see you. We got to work together for a few years, and Jeff still is occasionally willing to talk to me, which I’m very proud of.

Jeff Dean: We have an occasional dinner, which is great fun.

Bill Coughran: Yeah. No, he’s now the chief scientist, I think, at Alphabet. So I thought we’d start—obviously, a lot of the people in the room are excited about AI and what’s happened. Google clearly introduced a lot of the tech that the industry is based on—transformers and other things. Where do you see things going these days as you look out, both within Google, but also in the industry as a whole?

Jeff Dean: Yeah. I mean, I think this sort of period has been a fairly long time in developing, even though it’s sort of come into sort of popular visibility only in the last three or four years. But really, starting maybe in 2012 and ‘13, people were starting to be able to use these really at that time, what seemed like large neural networks to solve interesting problems. And the same sort of algorithmic approach would work for vision and for speech and for language. And that was pretty remarkable and kind of brought attention to machine learning as a way to solve those problems, rather than sort of more traditional handcrafted approaches.

And one of the things we were interested in in 2012 even was how can you scale and train very, very large neural networks? So we trained a neural network that at the time was 60x larger than anything else, and we used 16,000 CPU cores because that’s what we had in our data centers. And got really good results. And that really cemented in our mind that scaling these approaches would really work well. And there’s been a whole bunch of evidence of that and hardware improvements to help increase our ability to scale to larger and larger models, larger data sets. You know, we had an expression, “bigger model, more data, better results,” which has been sort of relatively true for the last 12 or 15 years.

And where things are going, I think now the models that we have are capable of doing really interesting things. They can’t solve every problem; they can solve a growing set of problems year over year because the models get better, we have better algorithmic improvements that show us how to train larger models with the same compute cost, more capable models. And then we have scaling of hardware, we have increasing compute per unit of hardware. And also, we have reinforcement learning and post-training kinds of approaches that are making the models better and sort of guiding them into the ways that we want them to behave. And that’s really exciting, I think. Multimodality is another big thing, like, having the ability to put in audio or video or images or text or code, and have it sort of output all those kinds of things as well is pretty useful.

Bill Coughran: The industry is, I think, mesmerized by agents right now. How real do you think agents are? I know Google introduced an agent framework. Some of this stuff, not Google’s necessarily, but some of the agent stuff seems to be a little bit vaporware to me.

[laughter]

Bill Coughran: Sorry, folks. I’m a little direct, as some folks will tell you.

Jeff Dean: I think there’s a lot of promise there, because I do see a path for agents with the right training process to eventually be able to do many, many things in the virtual computer environment that humans can do today. You know, right now they can sort of do some things, but not most things. But the path for increasing the capability there is reasonably clear. You get more reinforcement learning going, you have more agent experience that it can learn from. You have early nascent products that can do some things, but not most things, but are still incredibly useful for people.

And I think similar things will happen in physical robotic agents as well. Right now we’re probably close to making that transition from robots in messy environments like this room kind of don’t quite work today, but you can see a path where in the next few—year or two they’ll start to be able to do 20 useful things in this room. And that will introduce, you know, pretty extensive robotic products that can do those 20 things. And then learning from experience, they will then get cost engineered to now have something that’s 10 times cheaper and can do a thousand things. And that’s going to engender even more cost engineering and more improvement in capability. So it’s exciting.

Bill Coughran: It is. And it does seem like it’s coming, even though it’s vaporware to me. I guess one of the other things that comes up I think with a lot of young companies is what’s happening with the large models. I mean, clearly Google has Gemini 2.5 Pro and Deep Research and so forth. And then there’s OpenAI and a number of other players. I think there’s an open debate about how many large language models, open source, closed source, where are things going? How do you think about that? Obviously, Google has a strong position and wants to, I’m sure, dominate in that area, but how do you see the landscape?

Jeff Dean: Yeah. I mean, I think clearly it takes quite a lot of investment to build the absolute cutting-edge models. And I think there won’t be 50 of those. There may be like a handful. And there are an awful lot—you know, once you have those capable models, it’s possible to make much lighter-weight models that can be used for many more things because you can use techniques like distillation that I was a co-author on and got rejected from NeurIPS 2014 is unlikely to have impact.

[laughter]

Bill Coughran: I’ve heard that technique may have helped DeepSeek.

Jeff Dean: So that’s a really nice technique if you have a better model, and then you can put it into a smaller-scale thing that actually is pretty lightweight and fast and all the kinds of properties you might want. So I mean, I think there will be quite a number of different players in this space, because different shape models or models that focus on different kinds of things, but I also think a handful of really capable general purpose ones will do pretty well.

Bill Coughran: Fair enough. I guess hardware is the other thing that’s interesting. It looks to me like every large player is building their own hardware. Obviously, Google has been very public about the TPU program, but Amazon has their own, rumors are Meta has one, rumors are OpenAI is building one. You know, there’s lots of hardware, and yet the industry seems to only hear about Nvidia. I’m sure that’s not true in your office, but how do you think about that? How important is specialized hardware for this stuff?

Jeff Dean: Yeah. Well, I mean it’s very clear that having hardware that is focused on sort of machine learning-style computations—and, you know, I like to say accelerators for reduced precision linear algebra are what you want, and you want them to be better and better generation over generation, and you want them to be connected together at large scale with super high speed networking so that you can spread your model computation out over as many, you know, compute devices as possible. I think it’s super important. You know, I helped bootstrap the TPU program in 2013 because it seemed obvious we would want a lot of compute for inference at that time. That was the first generation. And then the next generation of TPUs, TPU v2, was focused on both inference and training because we saw a big need there. And I think we’re on now—we stopped numbering them for some annoying reason, so now it’s—we’re on Ironwood, which is coming out any day now, and Trillium before that.

Bill Coughran: Be careful. That sounds like an Intel chip-naming strategy which hasn’t worked that well.

Jeff Dean: Small [inaudible].

Bill Coughran: Yeah. No, I guess going a little bit off topic, and then maybe we’ll open to questions from folks in the room. I have a lot of friends who are physicists. They were a little surprised when Geoff Hinton and his colleagues won the Nobel in physics. I guess, how do you see AI—you know, some of the physicists I know are sort of offended that a non-physicist is starting to win Nobel prizes. How far do you think AI is going to go in various fields at this point?

Jeff Dean: Pretty far, I think. I mean also this year, my colleague Demis and John Jumper won it for …

Bill Coughran: I almost forgot that.

Jeff Dean: Yes, yes. So double Nobel Prize celebration Monday and Tuesday or whatever it was.

[laughter]

Jeff Dean: So I mean I think that’s a sign that really AI is influencing lots of different kinds of science, because at its core, you know, can you learn from interesting data? And a lot of parts of science are about making connections between things and understanding them. And if you can have AI-assisted help in doing that—you know, one of the things I’ve seen in many different fields of science is many disciplines often have incredibly expensive computational simulators of some process, like weather forecasting is a good example or, you know, fluid dynamics or quantum chemistry simulations. And often what you can do is use those simulators as training data for a neural net, and then build something that approximates the simulator but now is 300,000 times faster. And that just changes how you do science because all of a sudden, well, I’m going to go to lunch and screen 10 million molecules. That’s now possible, instead of I would have to run that for a year on compute I don’t have. And I think that just kind of fundamentally changes your process of how you do things, and will make faster discoveries.

Bill Coughran: I think it’s probably the most interesting if there are questions from the audience at this point. I have other questions for Jeff, but …

Audience Member: Well, actually just to quickly follow up on that, Geoff Hinton famously left Google after, like, studying, I guess, the effects of—or the differences between digital and analog computing as a future platform for inference and learning. And I’m wondering, is the future of inference hardware analog?

Jeff Dean: It’s definitely a possibility. I mean, I think, like, analog has some nice properties in terms of it being very, very power efficient. I think there’s a lot of room for digital things to be much more specialized for inference as well, And it’s a little bit easier to work with typically. But, you know, I think there is a general direction of how can we make inference hardware that is 10, 20, 50, 1,000 times more efficient than what we have today. And that seems eminently possible if we put our minds to it. It’s actually something I’m spending a bit of time on.

Audience Member: Hi, I was just going to ask about developer experience versus hardware. I think the TPU hardware is extremely impressive, but there’s a lot of—you know, in the zeitgeist about how CUDA or different, like, you know, technologies are easier to use than the TPU layer. And so I’d be curious for your perspective on that, and is that something you’ve been thinking about or getting a lot of angry emails about?

Jeff Dean: Yeah. I mean, I don’t connect with cloud TPU customers all that much, but definitely the experience can be improved. One of the things we started working on in 2018 is a system called Pathways, which is really designed to enable us to take lots of different computing devices, and then give sort of a really nice abstraction with those where you have a virtual-to-physical device mapping that is managed by the underlying runtime system. And we have support for that for both PyTorch and JAX. We primarily use JAX in house, but what we have is a single JAX Python process; it just looks like it has 10,000 devices on it, and you just write your code as you would as an ML researcher and off you go and, you know, you can prototype it with four or eight or sixteen or sixty-four devices, and then you change a constant and you run against a different Pathways back in with 1,000, 10,000 chips and off you go. Like, our largest Gemini models are trained with a single Python process driving the entire thing with tens of thousands of chips, and it works quite well. So pretty good developer experience, I think.

One thing I would say is, to date, we had not offered that to cloud customers, but we just announced at Cloud Next that we’re now going to have Pathways available for cloud customers, so then everyone else can have the delightful experience of a single Python process with thousands of devices attached. And I agree that’s a much better experience than managing, like, 64 processors for your 256 chips. Like why would you want to do that?

Audience Member: I love using the Gemini API. It would be even easier if it got one API key rather than, like, the Google Cloud credential setup. Do you guys have a plan to unify the Google Cloud Gemini stack with the Gemini project set up right now that’s more for testing stuff?

Jeff Dean: Yeah, I think there’s a bunch of streamlining that is being looked at. It’s a known problem, not something I spend a lot of time on personally, but I know, like, Logan and others on the developer side are aware of this friction. We like to make it frictionless to use our models.

Audience Member: Is that working? Okay. So it’s an interesting time in computing. You’ve got the confluence of Moore’s Law and Dennard scaling being completely dead with AI just scaling like crazy. You have a pretty unique position in the world of driving these supercomputers and infrastructure that is being built, and you know how to map the workloads onto these things, which is a unique sort of skill. What do you think the future of computing is going to look like? What is the computing infrastructure heading towards, like, from an asymptotic thought experiment level?

Jeff Dean: Yeah. I mean, it’s really clear that we have dramatically changed the kinds of computations we want to run on computers in the last, say, five years, ten years. And that was, like, initially a small ripple, but it’s pretty clear now that you want to run incredibly large neural networks at incredibly high performance and incredibly low power. And you also want to train them. Training and inference are pretty different kinds of workloads, so I think it’s useful to think of those two as you probably want different solutions for the two, or somewhat specialized solutions.

And I think you’re going to see all kinds of adaptations of compute platforms for this new reality that you really just want to run incredibly capable models, and some of that will be in low power environments like your phone. Like, you’d like your phone to run incredibly good models with lots of parameters super fast, so that when you talk to your phone it just talks back to you and it can help you do all kinds of things. You’re going to want to run these on robots and autonomous vehicles. You know, we already do somewhat, but even better hardware for that will make those systems much easier to build much more capable physical agents in the world.

And then you want to run them at incredibly large scale in data centers. And you also then want to use lots of inference time compute for some kinds of problems but not others. So you have problem—you know, it’s pretty clear you want to use 10,000 times as much compute for some problems as for others. And that’s a nice new scaling knob we have that can make your model much more capable, or give you much better answers or make the model capable of doing things with that much compute that it can’t do with 1x as much compute. But you shouldn’t spend 10,000 times as much compute on everything. So how do you make your systems work well for that? And I think that’s a combination of hardware, system software, model and algorithmic tricks, distillation, all these things can help you make amazing models come to life in small compute footprints.

Bill Coughran: One thing I’ve noticed is the computer science, at least traditionally, you know, when people are studying algorithms and computational complexity, it was all op count based. And I think as people are rediscovering hardware, and details of hardware and system design, I think one of the things that’s come back into focus is you need to think about network bandwidth and memory bandwidth and so forth. And so I think a lot of the kind of traditional algorithmic analysis needs to be completely rethought just because of realities of what real computation looks like.

Jeff Dean: Yeah. One of my office mates in grad school did his thesis on, like, cache-aware algorithms because the order of magnitude [inaudible] kind of notation didn’t account for the fact that some operations are 100x worse than others.

Bill Coughran: Yeah. No, that’s right.

Jeff Dean: And I think in modern ML computing, you care about data movement at the incredibly small level, like moving things from SRAM into accumulators costs you some tiny number—some tiny number of picojoules, but it’s way more than the actual operation costs you. So it’s important to have picojoules at the tip of your tongue these days.

Bill Coughran: One other quick question: do you vibe code?

[laughter]

Jeff Dean: I’ve been trying it a little bit. It actually works surprisingly well. Yeah, I mean, we’ve had some nice—we have a little demo chat room—actually, we have a lot of chat rooms. We sort of run Gemini via chat room. So I’m in, like, 200 chat rooms, and when I wake up and brush my teeth I get, like, nine notifications because my London colleagues are busily doing things. Like, we had one where people can send out cool demos of things I’ve seen, and one that was particularly cool was you feed in a YouTube educational-oriented video, and the prompt is just something like, “Please make me an educational game that uses graphics and interactivity to help illustrate the concepts of this video.” And it doesn’t work every time, but 30 percent of the time you get something that’s actually kind of cool and related to differential equations or traveling to Mars or doing some kind of cell aspect thing. And that’s just an incredible sign for education. The tools we now have and will have in the next few years really have this amazing opportunity to change the world in so many positive ways, so I think we should all remember that as kind of what we should be striving for.

Bill Coughran: Would you mind passing there and then maybe there?

Audience Member: Yeah, I would love to hear your thoughts about the future of search, and especially given Chrome has such big distribution, right? And then especially Chrome already knows the credentials like payments and then web signing credentials, have you thought about getting Gemini just directly into Chrome, you know, making the Chrome app the Gemini app instead of have a separate app? You know, I say this because I’m a long-term Googler, so just think about …

Jeff Dean: Yeah. I mean, I think there are definitely lots of interesting downstream uses one could make of the core sort of Gemini models or other models. One is can it help you do stuff in your browser or on your full computer desktop by observing what you’re doing and, you know, doing OCR on tabs, or maybe it has access to the raw tab contents. That seems like it will be incredibly helpful, and I think we have some early work in this area that we publish public demos of in video form that seem pretty useful, things like Mariner and things like that. So TBD.

Audience Member: Question for you. So thank you for your comments, very insightful. Earlier you mentioned, like, the number of foundational model players will likely only be a handful. And this is largely because of the infrastructure costs and the scale of investment to sort of remain at that cutting edge. And so as this battle for the frontier unfolds, like, where do you see this end game going? Like, where does this lead us? Like, is it just whoever writes the biggest check to build the biggest cluster wins, or is it, you know, better—you know, you just talked about, like, better utilization of unified memory optimization, and sort of different efficient uses of what you already have. Or is it the consumer experience, or where does this arms race lead us?

Bill Coughran: Isn’t it just whoever gets to Skynet first, the game’s over?

[laughter]

Jeff Dean: Yeah. I mean, I think it’s going to require really good, insightful algorithmic work, as well as really good systems hardware and infrastructure work. I don’t think either one of those is more important than the other, because what we’ve seen in say our Gemini progression from generation to generation is the algorithmic improvements are as important or maybe even more so than the hardware improvements or the larger amount of hardware we’re putting to the problem, but both are incredibly important.

And then I think from a product standpoint, you know, there’s sort of early stage products in this space, but I don’t think we’ve collectively hit on what is the thing that—or it’s probably going to be many things that become the daily used products for billions of people, right? I think there’s probably some in the educational space or in general information retrieval that is search-like but sort of taking advantage of the strengths of large multimodal models. I think probably helping people get stuff done in whatever work environment they find themselves in is going to be an incredibly useful thing. And how will that get manifested in product settings? How do I manage my team of 50 virtual agents that are off doing things, and they’ll probably be mostly doing the right thing but occasionally they’ll need to consult with me about some choice they need to make? I need to give them a bit of steering. How do I manage 50 virtual interns? It’s going to be complicated.

Audience Member: Hi Jeff. Thanks for being here. Right here. I literally cannot think of anyone better in the world to ask this question: How far do you believe we are from having an AI operating 24/7 at the level of a junior engineer?

Jeff Dean: Not that far. Yeah.

[laughter]

Bill Coughran: Is that six weeks, or six years or …

Jeff Dean: Every year in AI seems like a dog seven or something. I will claim that’s probably possible in the next year-ish. Yeah.

Audience Member: Hi Jeff. You talked about scaling pre-training and now scaling RL. How do you think about the future trajectory of these models? Will it be one large model with all the compute, or a constellation of smaller models that have been distilled from these larger models both working in parallel? How do you see the future landscape?

Jeff Dean: Yeah. I mean I’ve always been a big fan of models that are kind of sparse and have different parts of expertise in different parts of the model, because from our weak biological analogies that’s partly how our real brains get so power efficient is we’re 20 watts or whatever, and we can do a lot of things, but our Shakespeare poetry part is not active when we’re worried about the garbage truck backing up at us in the car. And I feel like we do some of that with a mixture of expert-style models. We did some of the early work in that space where we had 2,048 experts and showed that it gave you dramatic improvements in efficiency, like 10 to 100x more efficient sort of model quality per training flop.

And that’s super important, but it feels like we’re not really fully exploring the space yet, because right now the kinds of sparsity people tend to do is incredibly regular. It feels like you want paths through your model that are, like, 100 or 1000 times more expensive than other paths. And you want experts or pieces of your model that are tiny amounts of compute, and some that are very large amounts of compute. Maybe they should have different structures. And I think you want to be able to extend your model with new parameters or new bits of space, and maybe you want to be able to compact parts of your model, running a distillation process on this piece of it to make it one quarter the size. And then you have some background garbage collection-y thing that is now like, “Oh, great. I have more memory to use. So I’m going to put those parameters or put those bytes of memory somewhere else and make more effective use of them somewhere else.” So that, to me, seems like a much more organic continuous learning system than what we have today. So the only problem with this is what we’re doing today is incredibly effective, so it becomes a bit hard to completely change what you’re doing to be more like that. But I really do think there are huge benefits to doing things in that style rather than the sort of more rigidly defined model that we have today.

Bill Coughran: I think one more question and then we’ll probably wrap up.

Audience Member: Hey, I wanted to return to the junior engineer inside a year. I’m curious, what advancements do you think we need to get there? Like, obviously, just maybe code generation gets better, but outside of code generation, what do you think gets us there? Tool use? Agentic planning?

Jeff Dean: Yeah. I mean, I think this hypothetical virtual engineer probably needs a better sense of many more things than just writing code in an IDE. It needs to know how to run tests and debug performance issues and all those kinds of things. And we know how human engineers do those things. They learn how to use various tools that we have and can make use of them to accomplish that. And they get that wisdom from more experienced engineers typically, or reading lots of documentation. And I feel like junior virtual engineer is going to be pretty good at reading documentation and sort of trying things out in virtual environments. And so that seems like a way to get better and better at some of these things. And I don’t know how far it will take us, but it seems like it’ll take us pretty far.

Bill Coughran: Jeff, thank you for coming and sharing your wisdom.

Jeff Dean: Thank you. Great to see you.

[applause]

LIVE: Google’s Jeff Dean on the Coming Era of Virtual Engineers

Training Data: Ep45

Listen Now

Stream On

Summary

Transcript

Chapters

Contents

Where is AI going?

Are agents vaporware?

How important is specialized hardware?

Pathways for cloud customers

The future of computing infrastructure

Do you vibe code?

What’s the end game?

How far are we from an AI junior engineer?