DeepMind’s Pushmeet Kohli on AI’s Scientific Revolution

Training Data: Ep54

Pushmeet Kohli leads AI for Science at DeepMind, where his team has created AlphaEvolve, an AI system that discovers entirely new algorithms and proves mathematical results that have eluded researchers for decades. From improving 50-year-old matrix multiplication algorithms to generating interpretable code for complex problems like data center scheduling, AlphaEvolve represents a new paradigm where LLMs coupled with evolutionary search can outperform human experts. Pushmeet explains the technical architecture behind these breakthroughs and shares insights from collaborations with mathematicians like Terence Tao, while discussing how AI is accelerating scientific discovery across domains from chip design to materials science.

Listen Now

Stream On

Summary

Pushmeet Kohli, VP of Research at Google DeepMind, has been at the forefront of using AI to make groundbreaking scientific and mathematical discoveries, notably with AlphaFold, FunSearch and now AlphaEvolve. The episode emphasizes how combining large language models with structured search and evaluation protocols is already transforming both the pace and nature of scientific progress across domains from mathematics to materials science.

The “harness” architecture is everything: True AI breakthroughs come from pairing powerful generative models with robust, domain-specific evaluators—what Pushmeet calls “the harness.” This architecture distinguishes between hallucinations and brilliant insights. For AI founders, investing in strong, trustworthy evaluators is as important as advancing core models.

Remove constraints to unlock true discovery: AlphaEvolve’s breakthrough came from removing human-provided templates and searching entire algorithms rather than code snippets. This broader, less restrictive search space leads to more significant breakthroughs. Consider loosening system constraints without sacrificing safety or evaluation rigor.

Multi-agent systems extract hidden value: Different models critiquing, refining and ranking each other’s outputs yield results beyond single models. DeepMind’s AI co-scientist system improved dramatically over days of computation. Models are often better at evaluating ideas than generating them, so separate generator, critic and editor roles.

Interpretability remains essential for adoption: Engineers prefer interpretable code over black-box neural networks, even with slightly lower performance, because they need to debug systems. AlphaFold succeeded through accuracy plus calibrated uncertainty—knowing when it makes mistakes. Include transparency about limitations and confidence levels.

Focus on domains with objective evaluation: AI has the greatest impact where evaluation is objective, rapid, and scalable—mathematics, chip design, computational biology. The critical question is, “Can you find a trustworthy function evaluator?” Prioritize domains with clear, measurable success metrics over subjective or lengthy validation cycles.

Transcript

Chapters

Can AI make novel scientific discoveries?
Generators and verifiers
Is math the gold standard?
Crossing the digital divide
Accelerating everything
Lightning round
Mentioned in this episode

Pushmeet Kohli: So I went to a biology conference, and after I gave my talk, a biologist approached me and he said, “Pushmeet, I have been working on this protein for the last 10 years, and I had collected so much lab data to characterize this protein, to figure out its structure, but somehow this has eluded—like, this has evaded all kind of investigation and we still didn’t know the structure.” But we had all this data. If we knew the structure, we could sort of validate it very quickly. I ran AlphaFold 2. It gave me the structure; it perfectly fit the answer. I’ve been working on this for 10 years.

Pat Grady: Wow.

Pushmeet Kohli: What do I do next?

Sonya Huang: What happens when AI stops just answering questions and starts asking them? In this episode Pushmeet Kohli discusses DeepMind’s AlphaEvolve, a breakthrough evolutionary AI system that discovers entirely new algorithms. Pushmeet reveals how coupling language models with evaluators creates something unprecedented—AI that can tackle decades-old math problems and generate human-interpretable code that outperforms expert design solutions.

Pushmeet shares stunning examples of AI uncovering hidden mathematical truths and explains why we’re witnessing the emergence of a new scientific method, one where AI doesn’t just accelerate discovery but transforms which problems we can even attempt to solve. Enjoy the show.

Sonya Huang: Pushmeet, thank you so much for joining us today. We’ve all been eagerly waiting for the moment that AI is capable of making novel scientific discoveries. Do you think that AlphaEvolve is that watershed moment?

Pushmeet Kohli: Yeah, so it’s certainly a key sort of milestone. What we have shown is that you have an AI sort of model, a large language model, when coupled with a harness is able to discover new algorithms. And not only that, it’s basically able to prove new mathematical results which have been studied for many, many years.

Pat Grady: You use the words “when coupled with a harness.” Can you tell us more about that harness?

Pushmeet Kohli: Yeah, so if you now go back to AI models, like, the history of AI for science is very long. We have a number of different models that try to do scientific discovery. Like, one of the key models in this category is AlphaFold, right? Which is the prototypical example of what can be achieved by AI in science. We released AlphaFold 2 at the end of 2021, and it won the Nobel Prize last year. So the impact of AI in science is very well understood.

Now the question is whether LLMs and foundational models, how can they impact science? Around two years back, we had an agent called FunSearch in which we took an LLM and we coupled it with an evaluator. And the evaluator allowed the LLM to figure out when it was making new conjectures or coming up with new ideas to solve problems, whether they were hallucinations or whether they were brilliant insights. So essentially, in this particular case, hallucinations were great because some of those hallucinations were, in fact, brilliant new insights that nobody had thought about. So this is where the harness comes in: that you have this evaluation function and essentially a search protocol associated with the LLM that together is able to come up with completely new discoveries that are really impactful.

Sonya Huang: You mentioned FunSearch. Could you say a word on the difference between the results you all accomplished with FunSearch versus AlphaEvolve?

Pushmeet Kohli: Yeah, so FunSearch was our first instantiation of taking a large language model and trying to see if it can discover new algorithms. The models at that time were weaker, right? And the type of search that we were trying to do, we had not sort of explored things much further. So what we asked the LLM to do was essentially try to complete a small function and see if it can do that much better. And surprisingly, it was able to discover completely new algorithms that mathematicians had been trying to study for a long time. But the limitation was that the mathematician or the researcher had to give a template in which the algorithm should be found. With AlphaEvolve, we have removed that restriction. AlphaEvolve is not just sort of searching for a few lines, it’s basically looking at whole algorithms themselves, very, very large pieces of code and optimizing them over a long period of time. And secondly, FunSearch, our original model, used a lot of function evaluations to make these new discoveries. AlphaEvolve can work with many fewer function calls, and it can—basically by looking at fewer proposals, it can discover new algorithms much more quickly.

Sonya Huang: Can you tell us about the role that the evolving Gemini models play in the capabilities of AlphaEvolve? And I think I saw in your blog posts, you have both Gemini Flash and Pro involved in the harness. What is each responsible for?

Pushmeet Kohli: Yeah. We have been evaluating—or as Gemini improves with various sort of generations, it is becoming much, much better at its understanding of code. Now if you have a proposal generator which can understand code much more effectively then it generates proposals which are not only syntactically correct, they are also semantically trying to solve the task. And then you are sampling what are the different ways in which the task can be solved. So as the baseline model, Gemini’s abilities to perform coding improve our sample effectiveness in searching for the right solution on these very hard maths and computational problems becomes much better. So if you want to search in a large space, there are two sort of elements. One is the speed of how you can generate these proposals, and then the speed at which you can evaluate those proposals. Right? Like, so first, how quickly can you say, “Can we give me a new sort of candidate algorithm?” And then secondly, how quickly can you evaluate whether the algorithm is any good or not? And both things are really important, and the fact that you have these variants of Gemini, like Gemini Flash, which can do that very efficiently and very quickly, this is really important.

Pat Grady: I know AlphaEvolve is more of a broad domain model than some of its predecessors. How broad is it? What’s in scope? What’s out of scope?

Pushmeet Kohli: Yeah. So AlphaEvolve essentially allows you to search not only in terms of the size of what you can search over—you can now discover whole new algorithms—but it also is extremely general in its ability of thinking about algorithms in various different languages. So not only can it search in C++, but it can also do it in Python. It can also sort of do in Verilog, which is what the language is for describing chips in chip design. So the generality of AlphaEvolve is in its ability to search for these large algorithmic spaces, but also in different syntactic and semantic representations, right? It is not restricted to a particular language like Python, but it can sort of do that search across many different types of languages and many different types of tasks. The only expectation it has is that you have a function evaluator, that you can quickly evaluate whatever proposal there is and say how good it is.

Sonya Huang: It seems like the rough kind of cognitive architecture, so to speak, of generating a bunch of algorithm candidates, evaluating them, and then I guess evolutionarily deciding which ones to keep and then going forward from there, it seems like it roughly mirrors the scientific method. Is that intentional?

Pushmeet Kohli: Yeah, so I think there are also—like, if you think about it, there is another sort of agent that we released earlier this year, which was called co-scientist. And in co-scientist, essentially what you had was Gemini playing the role of the whole scientific academic process. So Gemini playing the role of a hypothesis generator, Gemini playing the role of the critic, Gemini playing the role of sort of ranking different sort of—of reviewing those ideas and ranking those ideas and then editing those ideas. So it was Gemini playing all these roles in a multi-agent set up. And these were all sort of Gemini models prompted differently to play different roles. And very interestingly, this combined multi-agent system came up with behavior that went much beyond a single Gemini sort of model’s answer. So it was able to give much, much, much better proposals and new ideas compared to a single sort of model.

Pat Grady: What’s the intuition behind why that works?

Pushmeet Kohli: Yeah, so I think that it is something that is still being studied. But it is a fascinating sort of thing. The one thing that I actually sort of noticed is that, especially with regards to a sort of co-scientist, you would run co-scientist on a particular problem, and the very first answer that you might get might not be very different from the baseline Gemini sort of model. But what happens is, even as you sort of increase the amount of computation over—you’re not talking about just a few minutes or a few hours, but even days, as the whole multi-agent system sort of looks at the solutions and then refines them and sort of tries to sort of rank them, it just becomes much, much better. So why might that be happening? It might be that the proposal that there is deep insights or there is some sort of intuitions that are buried in the tale of the distribution. Right? And then somehow Gemini’s ability to evaluate which sort of proposal, which idea is better, is much better than its capability to come up with a sort of new idea. It’s the same sort of thing in computer science, right? Sometimes we are able to find—sometimes we know whether a particular solution is correct or not, but it’s very difficult to come up with a solution, right? So it’s the same sort of thing appearing again in this multi-agent setup that somehow the agents working together are able to extract many more impactful results.

Sonya Huang: It seems like the architecture of generators and verifiers, it seems like that paradigm is being echoed across the broad AI space, whether it’s very general models, or very specific kind of like AI systems for very specific applications. Is that fair that that’s sort of the consensus architecture right now? And do you think that’ll be the thing that people continue to push and scale?

Pushmeet Kohli: Yeah, so I think there is going to be more work in agents, right? What we are seeing is basically the very start of research on agents, where in AlphaEvolve, you had a generator coupled with an evaluator. The generator was a neural network, a foundational model, an LLM, and the evaluator was even hand-coded, right? But together with an evolutionary sort of search scheme, you were able to sort of get these much more effective results. In co-scientist, you didn’t have just one agent, you had multiple agents working in a shared memory. Now, like, what is the optimal agent configuration? This is still an open research problem.

Sonya Huang: Super interesting. Are the results that you’re getting, are they different from the ways that humans would derive them? I’m kind of thinking of the AlphaGo move 37 stuff. Like, are the methods different? Are the results—how do they compare to the ways humans would think about them?

Pushmeet Kohli: So let’s go back to the original sort of motivation for why we started even working on the first iteration of using LLMs for algorithmic discovery, which was FunSearch. So a few years back, like as you know, DeepMind has spent a lot of—has done a lot of work in using AI systems for searching over large spaces. We have done a lot of work in building agents which have been trained using reinforcement learning, which can deal with many complex challenges, from the game of Go to playing Starcraft, which are quite complex challenges. We set ourselves a challenge that can we take the same kinds of models like the AlphaZero family of models, which were extensions of, or sort of extensions of what we had done in Go and the development of AlphaGo, can we use the same types of models for discovering new algorithms?

And we came up with a new sort of agent called AlphaTensor, which was particularly sort of focused on finding solutions for the matrix multiplication problem. And we found that this agent was able to improve over the past known results which had stood for 50 years. But the key question sort of remained: Can you do something better? And secondly, can you sort of come up with a solution that is more interpretable? At the same time, like, when we were looking at practical problems in Google, like how do you schedule jobs in a data center, now there has been a lot of work on coming up with new algorithms, and these heuristics have been designed by some of the best researchers and engineers at Google, and because they have a huge amount of impact in terms of computer utilization. If you use a typical reinforcement learning agent on this kind of problem, you might get better results, but it might come at the cost of interpretability, because now you have a neural network deciding which workloads go to which computers. And if something breaks, then you don’t know how you debug this thing.

So what engineers would really prefer is instead of giving them a neural network, you gave them a piece of code that they can interpret and they can run. And this was essentially the motivation. Can we now use LLMs, instead of searching in the space of specific algorithms like we had done—matrix multiplication algorithms like we had done in AlphaTensor, or coming up with a neural network policy to directly solve the problem, can we come up with an agent which can search in the space of programs, and come up with a program that solves this hard problem? And the benefit, of course, will be interpretability, that you can see the sort of code, you can see what its properties are and so on. And that’s what happened. We found sort of programs that not only were effective, but when the experts actually saw those programs, they could sort of recover insights. So for instance, one of the math problems that we had looked at for FunSearch was called the Capset problem. This is a problem that Terence Tao, one of the famous mathematicians, he is very interested in. And we collaborated with this mathematician Jordan Ellenberg at NYU, and when we looked at the program that FunSearch had produced, he found that there were certain cemeteries that were in the problem that had not been recognized before. And somehow the program, like FunSearch, the agent had discovered those and was utilizing those to get a better solution.

Sonya Huang: Can you say a word about—you mentioned working with Terence Tao and other famous mathematicians. Is math considered the gold standard for testing and benchmarking if these models are generating novel scientific results?

Pushmeet Kohli: Yeah, so math certainly sort of has some properties which are very interesting, right? The fact that it’s very precise. Like, you know whether the property that you are looking for, whether you have found it or not, right? You know the matrix multiplication, for matrix multiplication, how many multiplications you require. Like for a 4×4 matrix, what was known was that you can do it with 49 sort of multiplications, that’s by Strassen. And we showed that you can do it by 48. So that’s a very precise result, right? There is no sort of arguing about that. So it gives you a very crisp way of evaluating how well you have done. And there is no sort of like RLHF needed in terms of human feedback, whether this was a nice result or whether this was a nice output or not. And you don’t need to rely on sort of an LLM score. You just know that you are better.

Sonya Huang: Yeah. Okay, so then when you go from the beautiful, pristine environment that is math to the real world, it seems like you all have found a lot of real world applications in data centers. in the Verilog world. Could you say a little bit about which applications you expect AlphaEvolve to be most impactful for?

Pushmeet Kohli: Yeah, so wherever you can find a good function evaluator. Wherever you can find an evaluator where you can say, “I really trust this evaluation scheme. If you give me a program, I can tell you very concretely how good it is.” If your problem satisfies that setup, then you can use AlphaEvolve. Because unlike a human sort of programmer who can try 10 things or 100 things or 1,000 things, AlphaEvolve does not tire. It can go on and on and on and on, right? Like, it can come up with very counterintuitive sort of strategies to find, to solve that problem—some things that you might not have ever imagined.

Sonya Huang: Can you have humans be the function evaluators, or does that not work?

Pushmeet Kohli: Humans can be the function evaluators. It’s a question of sort of scale, right? Like how many can you sort of evaluate, and whether you can evaluate the property of the program effectively, so at scale and with the right level of accuracy.

Pat Grady: How do you do that? Do you build that into the application itself so that there’s a human in the loop evaluating as it goes? Do you do that offline separately before the application is produced? I guess, how do you do that or how do you imagine people doing that?

Pushmeet Kohli: Yeah, so I mean, like, we haven’t used a human in the loop for AlphaEvolve. Most of our evaluators were programmatic evaluators, right? But imagine a hypothetical scenario where AlphaEvolve was told that you have to solve this math problem and come up with a new algorithm to solve this problem. And suppose it came up with many different kinds of solutions, which are all equivalent in performance, okay? But then which one is the best? The best is the one which is not only sort of very effective on the problem, but is the most elegant according to a mathematician, or the most simple to understand. And that’s a very subjective human thing. Like, simplicity or interpretability, like, we don’t have a sort of crisp definition of it. It depends on—it is grounded in the human observer.

Sonya Huang: At what point do you need to kind of pair kind of what’s happening in the digital world to any kind of physical world stuff? I think in your blog post you mentioned that you could see AlphaEvolve being useful, for example, for material science. Do you need to be able to connect to a real-world laboratory to get any of that feedback? Or do you think all of this can happen in the algorithmic domain?

Pushmeet Kohli: Yeah, that’s a very good question. And I think this goes back to how much do you trust the evaluator? If your evaluation was based on a computational method, and the computational method was perfect and you completely trusted it, then you don’t have to. Then you think, “Well, I believe the computational model. The computational model says that the solution that AlphaEvolve came up with satisfies these properties. Job is done.” Right? But if you don’t believe that the computational model is the perfect characterization of reality, then you want to make sure that you sort of validate that result in the real world, right? And you see whether that assessment of the evaluator was indeed correct.

Sonya Huang: As AlphaEvolve becomes more and more successful, as Gemini becomes more and more powerful, what do you think happens to these domains, and how will the human scientists and engineers working in them adapt? So for example, if you take chip design as an example, you mentioned these models are getting very good at, you know, generating Verilog, creating new chip designs. Does that mean the role of a chip designer goes away? Changes? Like, how do you think that this changes the world?

Pushmeet Kohli: Yeah, so I think that’s again sort of a very interesting question. I’ll give you the example of what happened with AlphaFold. So we started working on this problem of protein structure prediction. So for those of you who don’t know, like, proteins are the building blocks of life. They are the Lego blocks of life. And for many, many decades, scientists have been trying to figure out what is the shape of proteins. Because if we understand the shape of proteins, we understand how they function, and we can use that to sort of develop new drugs to treat sort of the most challenging diseases on the planet. We can develop better sort of enzymes and so on.

Now in 2021, as I mentioned, we released AlphaFold 2. Before that, you used to take for even a single protein, sometimes one to five years to find the structure of a single protein, and it might take a million dollars. And there were some proteins which are so notoriously hard that people had been trying to study them for almost one or two decades and had not found the solution. And which is why only 37 percent, roughly 37 percent of the human proteins, their structure was known.

So, after we released AlphaFold 2, I went to a biology conference. And because with AlphaFold 2 we could find the structure of all proteins, not just human proteins, all proteins on the planet, and we made the structures available to everyone on the planet. So I went to a biology conference, and after I gave my talk, a biologist approached me and he said, “Pushmeet, I have been working on this protein for the last 10 years, and I had collected so much lab data to characterize this protein, to figure out its structure, but somehow this has eluded, like, this has evaded all kind of investigation, and we still didn’t know the structure.” But we had all this data. If we knew the structure, we could sort of validate it very quickly. I ran AlphaFold 2; it gave me the structure, it perfectly fit the answer. “I’ve been working on this for 10 years. What do I do next?”

Pat Grady: Wow.

Pushmeet Kohli: So what has happened after AlphaFold 2? What happened is basically suddenly it did three things. It first advanced structural biology. What was not possible earlier, which would take a synchrotron and six months and a million dollars is now done in a second, right? So it really advanced what was possible.

Secondly, it accelerated it. And thirdly, it democratized it. Like, that particular scientist working in Latin America or South Asia or Africa on some neglected tropical disease had no chance to sort of figure out the structure of their protein. They did not have the funds or have access to instruments that could find them the structure. Now they have access to those things to, like, any sort of parasite that they’re working on. So what do they do? They are now working in this new model where structures of proteins are not hard to get; they are everywhere. And so they are working on the next set of things, like how do you now use that knowledge to treat diseases and design better drugs?

And I think the same thing will happen with AlphaEvolve. Once you have these agents which can go beyond human abilities in solving these problems, then the question becomes: Which problems do we solve? What are the important characteristics of a chip that we need to improve on? Like, we want to make it much more efficient, much more sort of—so that it requires less cooling, it requires sort of less expensive construction mechanism, it’s more fault tolerant, many other things. You can make the problem more and more sophisticated because now you have more sophisticated systems to sort of optimize them.

Sonya Huang: While I have you. Something I’ve always wondered: the AlphaFold results are phenomenal. And the story you shared with us is really impactful. Do you think that it’s caused an inflection point in the kind of, you know, availability of new drugs, or are there other bottlenecks now that are just—you know, we’re faster at one part, but unfortunately everything else is just hard, so we’re still slow overall?

Pushmeet Kohli: No, so it has speeded things up, but I think there’s a—one has to understand that drug discovery is a long process. Now what are the roadblocks for drug discovery? First, you have to understand the target. You have to understand, here’s the protein in the body that I need to bind, because this protein is somehow involved in the disease. So if I can somehow bind something to this protein and change its function, it will have an effect that can sort of treat the disease.

Like, first you have to come up with that conjecture. Then you have to say, “Okay, now I have a target protein. How do I develop a drug? How do I develop a small molecule or another protein that binds to it?” So for that you needed to understand the structure of the protein—which other proteins that it interacted with, how did it interact with this molecule? This would take a significant amount of time—sometimes two years.

Now that process is dramatically accelerated. Now you can do it in a few weeks or a month or a few months that took you multiple years sometimes. But that’s not the end of the story. After that, you need to now clinically validate it. So you have to go through phase one trials, phase two trials, phase three trials. You have to think about toxicity, all these other things. So what AlphaFold did was take one blocker away, made the overall timeline faster, but there are other sort of blockers which our new generation of AI for biology models are hoping to accelerate and make much faster. So we have taken a big step, but we need to take a few more big steps.

Sonya Huang: What domains do you think will be most lucrative for this family of models?

Pushmeet Kohli: I think the question is what is—I mean, the answer to your question is basically what domains do you think are important for society? Because AI is going to accelerate everything. It’s going to accelerate healthcare, it’s going to accelerate the ability for us to develop more smart systems from healthcare to material science.

Like, if you think about the history of our civilization, we wouldn’t describe our civilization in the sense of first we were sort of cave dwellers, and then we went into the Stone Age and then we sort of went to the Iron Age and then the Bronze Age, and now depending on who you talk to, you are either in the Silicon Age or in the Plastic Age, whether you are optimistic or feeling a bit sort of sad.

But if you take a step back and you think about what has humanity achieved, what we have achieved compared to any other species is the ability to transform energy, to leverage energy, right? We have been able to leverage energy and do big things with that power. Now if you can come up with, say, a new room temperature superconductor, that completely transforms your ability to handle energy.

Pat Grady: Hmm.

Pushmeet Kohli: Right? What changes will it bring about in society? They are hard to predict if you can deal with energy in that way. If we can unlock fusion, and energy becomes so cheap, like, if you think about geopolitics, if you think about economy, a lot of it is about energy, right? And suddenly if energy sort of goes down to zero, what will be the impact on the economics of the whole thing? Similarly, like, if you think about coding. And if you have these agents which can code, what does that mean? If everyone can sort of code, like intelligence sort of is completely ubiquitous, everyone has access to all these different things. So there will be dramatic changes and everything will be impacted from materials to energy to sort of coding to healthcare.

Sonya Huang: Really cool. Do you think we’re gonna have a fast takeoff moment for scientific discoveries? Do you think we’re at the ramp of one? You think we’re already there?

Pushmeet Kohli: I think we are living through the middle of it. Like, when you’re in the middle, you don’t really see it,but I think we are already in that era of AI-accelerated scientific discovery.

Sonya Huang: What do you say are the biggest bottlenecks going forward?

Pushmeet Kohli: I think sort of two elements. One is validation—bridging the gap between the digital and the real world, right? How do you validate some of that? That is one sort of key idea. And really sort of capturing what is important for the problem.

And the second is sort of—the other bottleneck is how do you make this technology accessible? You can build the most sophisticated technology. If people don’t know how to use it, then you will not have the impact that you want, right? AlphaFold 2, it was not just impactful and transformative because it had very high accuracy. Because even if it was quite accurate, it was not perfect. And suppose it was accurate on 99 percent of the things that it predicted—it’s definitely not at 99 percent, probably at the 90 or 95 percent mark. But suppose even it was accurate at 99 percent, the one person who got unlucky with their prediction, and then spent the next one or two years chasing a wrong prediction would then sort of say that I should not use it. I should not sort of use the predictions.

So why is everyone using AlphaFold? They’re using AlphaFold because not only is AlphaFold good at making these predictions—which are accurate—but it’s also very good in understanding the limits of its predictions. When it makes mistakes, it basically holds its hand and says, “I’ve made a mistake.” So now if it is making you a prediction and saying, “I’m very confident,” like, most of the time it’s correct. And that’s great. This is something that the LLMs of today don’t have. They don’t have calibrated uncertainty.

Sonya Huang: Fantastic. Should we close out with some rapid fire questions?

Pushmeet Kohli: Yeah, sure.

Sonya Huang: Must read paper of the year.

Pushmeet Kohli: Must read paper of the year. Oh. I would say AlphaEvolve or co-scientist. [laughs] I like co-scientist. Yeah.

Pat Grady: Favorite algorithm nobody talks about?

Pushmeet Kohli: Oh, the wake-sleep algorithm. And very few people know about it, but it’s essentially the idea—it’s a paper from MIT, from Kevin Ellis and Josh Tenenbaum, which sort of talks about—it’s a way of sort of doing training where you find some exploration and then you somehow build the gist of it. So think about sort of library construction. An analogy is library construction, right? You don’t just want to write programs, but you want to also create the libraries that have common modules that will make all your future programs much easier to write.

Pat Grady: Hmm. Very cool.

Sonya Huang: Agree or disagree? Inference time computes will be the next major leg of compute scaling.

Pushmeet Kohli: Somewhat agree.

Sonya Huang: Okay, say more. [laughs]

Pushmeet Kohli: So I think inference time compute will be very, very important. I think also test time, sort of training time compute will be equally important. Like, if you look at distillation, how powerful distillation has been. So if these models have an ability to sort of understand and conceptualize what these models are able to do and come up with better inherent representations, then they just become much more effective in making predictions. Maybe their sort of uncertainty improves and so on. They become more efficient even.

Pat Grady: Robotics, bullish or bearish?

Pushmeet Kohli: I’m bullish about everything. So I have to say bullish. Like, I think everything will be sort of—will have an impact. But the question is basically near term or longer term, right? In the near term, it will take some sort of getting robotics to work is challenging but, like, in the medium to long term, I think I’m bullish.

Sonya Huang: Humanoid robots, bullish or bearish?

Pushmeet Kohli: We have constructed our world for humans, right? We like the human form. A lot of the non-natural world around us is made for humans, has been designed for humans, like, from an architecture perspective. Now humanoids have the same form as humans, so they will fit in in all these different architectures that we have built. Now whether they are the most optimal thing, that is not clear, but they certainly sort of have an advantage that we designed everything for the human form, and now humanoids have the same form.

Pat Grady: Future Nobel Prizes in the sciences. Will all of them be won by teams working with AI?

Pushmeet Kohli: No. I think we are getting there, but I think humans are still winning Nobel prizes in the sciences. So I think there will come a point where AI will be indispensable. So it will be sort of humans and AI teams working together to achieving these amazing breakthroughs.

Sonya Huang: Pushmeet, thank you so much for joining us today. These are really fundamental, really general results that you’re pushing forward at DeepMind, and we appreciate you joining us to share more about how you managed to do all this so far and what’s ahead. Thank you.

Pushmeet Kohli: Thank you.

Mentioned in this episode:

AlphaEvolve: DeepMind coding agent that designs scientific algorithms, powered by Gemini models
AlphaFold 2: Breakthrough protein structure model that won the Nobel Prize
FunSearch: More structured predecessor to AlphaEvolve
AI co-scientist: Google multi-agent AI system to be a virtual scientific collaborator
AlphaTensor: DeepMind model that found a better matrix multiplication algorithm
Cap set problem: Math challenge that Terrance Tao describes as, “perhaps, my favorite open problem.”
Strassen algorithm: Long-standing matrix multiplication solution that FunSearch beat
Wake Sleep algorithm: Reference to DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning by Ellis and Tenenbaum at MIT

DeepMind’s Pushmeet Kohli on AI’s Scientific Revolution

Training Data: Ep54

Listen Now

Stream On

Summary

Transcript

Chapters

Contents

Can AI make novel scientific discoveries?

Generators and verifiers

Is math the gold standard?

Crossing the digital divide

Accelerating everything

Lightning round

Mentioned in this episode