Skip to main content

Google I/O Afterparty: The Future of Human-AI Collaboration, From Veo to Mariner

Fresh off impressive releases at Google’s I/O event, three Google Labs leaders explain how they’re reimagining creative tools and productivity workflows. Thomas Iljic details how video generation is merging filmmaking with gaming through generative AI cameras and world-building interfaces in Whisk and Veo. Jaclyn Konzelmann demonstrates how Project Mariner evolved from a disruptive browser takeover to an intelligent background assistant that remembers context across multiple tasks. Simon Tokumine reveals NotebookLM’s expansion beyond viral audio overviews into a comprehensive platform for transforming information into personalized formats. The conversation explores the shift from prompting to showing and telling, the economics of AI-powered e-commerce, and why being “too early” has become Google Labs’ biggest challenge and advantage.

Summary

Google Labs leaders Thomas Iljic, Jaclyn Konzelmann and Simon Tokumine represent the cutting edge of consumer AI product development, overseeing creative generation tools, computer use agents and knowledge transformation platforms respectively. Their collective insights emphasize the critical importance of user control, iterative design and building for longer-term adoption curves rather than immediate viral moments (though they help!)

Show and tell trumps prompt engineering for creative workflows: Rather than forcing users to write lengthy text descriptions, the most effective AI creative tools allow users to demonstrate what they want through images, references and examples. This approach mirrors how humans naturally communicate creative intent—by showing inspiration and building iteratively rather than describing everything upfront. For AI founders, this means investing in multimodal interfaces that accept visual inputs alongside text, enabling users to guide AI systems through demonstration rather than description.

Build for the abstraction layer, not just model capabilities: While foundational model improvements continue rapidly, the real product differentiation lies in the interface layer that sits between raw AI capabilities and user needs. The challenge isn’t whether specific controls or features are technically possible—most can be built with current technology. Instead, the key is designing the right abstractions for how users input context, define parameters and iterate on outputs. Focus product development on the connective tissue that makes powerful capabilities accessible and intuitive.

Design for projects, not individual tasks: The most valuable AI applications support longer-running workflows rather than one-off requests. Users gravitate toward AI tools that help them accumulate knowledge, maintain context across sessions and make progress on extended goals—whether that’s a filmmaking project, research initiative or ongoing learning objective. Build systems that understand and preserve project context over time, allowing users to return and build upon previous work rather than starting fresh each session.

Timing beats perfection in AI product launches. The Google Labs team consistently acknowledged being “too early” with various AI product concepts, launching research prototypes before the underlying technology could fully deliver on the user experience. However, this timing mismatch often provided valuable learning and positioning advantages when capabilities eventually caught up. For AI founders, this suggests launching experimental versions of products even when the technology isn’t perfect, using real user feedback to guide development as models improve.

Context bridging creates superhuman user experiences: The most compelling AI applications don’t just automate individual tasks—they maintain and synthesize context across multiple information sources and actions simultaneously. Whether it’s an agent that remembers URLs while researching, or a system that connects content from multiple browser tabs, the ability to hold and cross-reference more context than humans typically can represents a fundamental advantage. Build AI systems that excel at context management and synthesis, not just task execution.

Transcript

Contents

Sonya Huang: You know, I was talking to a founder. He gave me the analogy of, you know, you want the user to almost be like the way that a director would direct the cast and crew of, you know, “Change the lighting here. Like, can you say this with a little bit more of an accent there?” And, like, almost like natural language, the way that a director would direct a cast and crew. What do you think is the right way to mold the clay?

Thomas Iljic: I still think it’s show and tell everywhere, so I don’t think you do everything through text. I think it’s kind of actually counterintuitive to have to transcribe everything. So I think there’s a lot of, like, showing and acting and mimicking or giving a reference just as inspiration in addition to the text.

Sonya Huang: Yeah.

Thomas Iljic: But the one thing that’s starting to become more clear, at least for me, is kind of video generation, simulation, games. They’re kind of like the same thing in this new world. And what that means is basically you’re kind of world building. You’re saying, “This is the stage, these are the assets, these are how things are supposed to look.” And then you shoot in it, and it can reshoot and refine and pause and correct something and go back in time and regenerate. Like, I think that’s where this is heading. And the UI’s going to be fairly novel.

The end of chapter one and the start of chapter two

Sonya Huang: It’s been exciting to see Google’s just cooking in AI. And I/O last week was very exciting. And it seems like the court of public opinion has just turned on its head so quickly, and right now everyone’s like, “Google’s out in front in AI.” Why do you think that is? Why did the public opinion change so quickly?

Thomas Iljic: I mean, the models, you know, to start with. I think they have a big thing to play with.

Sonya Huang: Good. Good answer.

Jaclyn Konzelmann: Definitely the models. And I think just the number of products that we have, and seeing all of this breakthrough in technology and AI come out into all of those products, but also all the net new products that we’re launching and the net new experiences. It just—it was a lot last week, and even not just at I/O but, like, the week leading up to it, I think. You had a big moment the day before.

Simon Tokumine: Yes. Yeah, I did. I did. Yeah, it’s definitely validating to see the, you know, public opinion on the models and Google’s position in AI changing maybe recently.

Sonya Huang: Yeah.

Simon Tokumine: It does feel to us on the inside at least, that it’s kind of the result of a lot of work, though. So it feels like we’ve been improving, to me at least, for at least the last three years in this area of gen AI. And maybe what we’re seeing externally is people seeing what we’ve been up to. It helps that we’re number one on many of the leaderboards, and it helps that some of the stuff that the models can do is state of the art, and I think is only possible with some of the Google models. But I think internally, it just feels like the end of chapter one and the start of chapter two. Yeah.

Sonya Huang: Wonderful. So here’s what I’d love to do today. We have three of the leaders from Google Labs in the room with us, for those in the audience. What I’d love to do is spend a little bit of time on each of the topics that you are responsible for, and then we can round up with some overall thoughts on AI. Does it sound good?

Jaclyn Konzelmann: Sounds great.

Sonya Huang: The three topics we’ll cover: We’ll go into Whisk, Flow and Veo with Thomas, which is Google’s video models and kind of creative image generation playground, for lack of a better word. We’ll go into Mariner, Google’s computer use agent, with Jaclyn, and then we’ll close on Notebook with Simon. And everyone knows Notebook, so that needs no introduction.

Simon Tokumine: Awesome. I love to hear that.

Key milestones in video generation

Sonya Huang: Okay. Thomas, let’s start with you. Tell us about the history of, you know, how you all have been cooking and building and experimenting in the creative image video generation space, and how long have you been experimenting with these products, and what have been the key milestones so far?

Thomas Iljic: Sure. Yeah, it’s been a really exciting space. I think that’s a very long question. So, you know, I’ll probably rant a little bit. I think the—I mean, we’ve had for a long time, like, good, you know, imagery models. There was, like, Imagen, there’s been Dall-E externally, et cetera. But something like two, three years ago is when—at least for us in Labs, when we were thinking about products, we had the ControlNet paper for people who remember. So it’s kind of like how do you take the model and start channeling it where you want so it’s not just like a push button thing? You can start saying, “I want the pose to be like this or the scene to be like this.”

Sonya Huang: Yeah.

Thomas Iljic: That was one. And then the second thing was LoRAs, where you can kind of show the model a range of things, and then suddenly you’re able to kind of pull from the image and be like, “What’s the range of possibilities for that particular piece?” And so that iteration, the sense that you can start controlling the outputs, that felt like the right moment for us to start exploring the creative process.

Sonya Huang: When was that?

Thomas Iljic: Probably two and a half to three years ago. And so then a lot of, you know, stumbling and trying things and failing. I think we had things where we trained a bunch of our people internally to see what they could do with a Flow UI type workflows. We even had a little animation thing going on where we created half an episode with artists. And we published—I think it’s “The Not-So-Supervillain,” if you want to check it out on YouTube.

And then more recently we ended up with a bunch of convictions out of that exercise. So we had things like creation has to be iterative, so we need to build kind of these controls next to the model. Media comes with the blueprint, which is this idea that if I generate something, you’re able to kind of pick up where I left off. And then the third one was like, it should be show and tell, so basically the driving force was instead of just telling the model with very long prompts, I can actually show you images, say “It should do kind of like this,” and we can build off of that. So this is where we started with Whisk on the consumer side for imagery and Flow for everything that’s high-end filmmaking exercise.

Sonya Huang: Really cool. And do you imagine Whisk and Flow will be kind of end consumer products in the kind of, you know, Google portfolio of, you know, billion-user scale consumer products eventually? Or how do you—or your playground for kind of testing model UX and you know, how best to bring this magic to users?

Thomas Iljic: Yeah, I think we see it as a spectrum. So I think Whisk is kind of our play in the, you know, really consumer space, and thinking about, like, everybody now has this visual language at their fingertips. They might not have necessarily, like, the most advanced ideas in terms of, like, storytelling, but they can quickly remix each other’s things. And so we’re trying to see what those dynamics look like. So I think that’s kind of our exploration space with Whisk.

Sonya Huang: Yeah.

Thomas Iljic: We’ll see how it picks up. I think a lot of the lessons will probably also graduate in just how we deal with user inputs and treat those across multiple surfaces. And then Flow is the other side of, like, you have a vision, you know what you want. And it’s kind of like how do we give you all the tools to create the best version of this in video?

Who are the ideal users for Whisk and Flow?

Sonya Huang: Yeah. Okay, super cool. Who’s the ideal user, do you think, for Flow and Whisk?

Thomas Iljic: For Flow, I think it’s pretty clear for us. We’re starting with AI filmmakers. And the reason is we want to build this kind of—we call it the generative AI camera. Like, you know, you’re doing world building, then you’re shooting inside this world. How do we actually develop the DSLR camera of generative AI video? And then we’ll distill kind of the Android version of the pixel camera version out of it. Whisk is much more consumer. There’s a wide range of audiences. You know, is it you creating something funny with your friends in a chat? Is it kind of more, you know, inside the company, you’re trying to create some visuals for slides? This kind of like all this whole range that we’re exploring. We’ll see where it lands.

Sonya Huang: Yeah. So cool. Okay, you said AI filmmakers. Is that a thing? Are people calling themselves AI filmmakers now? And does it tend to be existing filmmakers that are, you know, looking to be more AI savvy? Are you seeing, you know, net new creators come in and try to create feature films?

Thomas Iljic: I think it’s certainly an ill-defined term, but the reason why I like to say AI filmmakers versus filmmakers is I think if you take the extreme end of the spectrum, these are people who need very bespoke tools. They have, like, entire workflows and processes, and you need to develop very specific ideas.

There’s one tier under that, which maybe I classify as AI filmmaker potentially is, you know, pre-visualizations, where you’re trying to quickly get, like, a version out and maybe then you do the full process. Or people who just don’t have the budget, so they’re like, “I don’t have $100,000 to, you know, put my idea out there, but now I can at least take a shot at it.”

Sonya Huang: Yeah.

Thomas Iljic: And so those people are interesting to us because, like, you can really start from the ground up thinking of, like, if you had this generative AI camera, what would the user flow look like? Like, how would you fit those pieces?

Are we at video AGI yet?

Sonya Huang: Yeah. Your answer to my initial question of, you know, the models are the reason that the court of public opinion has flipped so quickly. It’s been amazing to see Veo’s progress, and Veo 3. And, you know, for me, I don’t know what evals you all look at to look at performance, but for me it’s the Will Smith spaghetti eating test, and we seem to have passed that. So are we at video AGI, or how do you think about the quality and the performance and what’s ahead?

Thomas Iljic: There’s still some room, but it’s pretty cool. I mean, the GDM team has done really great with Veo 3. I think the joke last week was that it beat Veo 2 in the ranking. So it’s kind of Veo beating Veo. So people were very happy about this.

Sonya Huang: Good.

Thomas Iljic: I think its adherence is going up. Yes, we don’t have the six finger problem. Physics are getting pretty good. There’s still things where, like, you know, if you want to have, for example, multiple characters and kind of choreograph those characters, have, like, full consistency across multiple scenes, like, that’s where there’s still, like, a lot to come. How do you refine your output? Can you propagate changes across clips? There’s going to be still a lot of, like, improvements, but in general, yeah, a huge step up.

And the biggest reveal this time was audio. So to be able to co-generate audio with the video, that brings kind of like, you know, a video is more than an image, and a video-based sound is way more than a regular video.

Sonya Huang: Yeah.

Thomas Iljic: That certainly has opened up a lot of virality.

Innovating on product and UI

Sonya Huang: Do you think the R&D left to do to make the ideal tool for the craft, how much do you think is in the product and in the UI, and how much do you think is going to need to happen in the model research layer and things like steerability?

Thomas Iljic: I think it’s both, but at least—I’m sure people will have a wide range of opinions, but it’s almost like we’re at a state where everything we imagine in terms of controls, I think we have visibility in how they can be built. You want to have consistency of characters, of scenes, of locations. There’s, like, different ideas around this. You want to reshoot. So that part—I think the part that’s hard is still the abstraction of all of it. So how do you put this in the—what are the inputs that you want from users? In the context of audio, for example, where do I define the voice? How do I attach the voice to the character? How do I define the mannerism? How does that propagate? So I think there’s going to be a lot of work in that abstraction layer on top of the models and on top of the controls.

Sonya Huang: Oh, so interesting. So you think most of the model kind of R&D is almost a—solved problem is maybe too strong, of a word, but …

Thomas Iljic: Not solved, but I think we …

Sonya Huang: We know how to do it.

Thomas Iljic: It will happen. I think it’s pretty clear that it’s moving very fast. And then, you know, we see a lot of things just like week after week coming up. But how we do the connective tissue on top, I think, is still pretty much open. And audio is, you know, one of those new frontiers, for example, of, like, should I be talking and driving the audio, then changing my voice? Should I be typing the text? How do I do diarization? There’s a lot of like, what are the inputs? How do you give—how do you let people mold clay with all these models?

Sonya Huang: What’s your guess for how that future is for how people will mold clay? And I was talking to a founder. He gave me the analogy of you want the user to almost be the way that a director would direct the cast and crew of, you know, “Change the lighting here. Like, can you say this with a little bit more of an accent there?” And, like, almost like natural language, the way that a director would direct a cast and crew. What do you think is the right way to mold the clay?

Thomas Iljic: I still think it’s show and tell everywhere, so I don’t think you do everything through text. I think it’s kind of actually counterintuitive to have to transcribe everything. So I think there’s a lot of, like, showing and acting and mimicking or giving a reference just as inspiration in addition to the text.

Sonya Huang: Yeah.

Thomas Iljic: But the one thing that’s starting to become more clear, at least for me, is kind of video generation, simulation, games. They’re kind of like the same thing in this new world. And what that means is basically you’re kind of world building. You’re saying, “This is the stage, these are the assets, these are how things are supposed to look.” And then you shoot in it, and it can reshoot and refine and pause and correct something and go back in time and regenerate. Like, I think that’s where this is heading. And the UI’s going to be fairly novel.

Sonya Huang: Yeah. You mentioned games. I wanted to ask about this. It feels to me like the existing way that we consume games versus movies is—you know, is because there’s such a tremendous fixed upfront cost of producing a movie. If you imagine that, you know, in a world where every movie frame is generated, not pre-rendered, and that, you know, entire story arcs can unfold, it does feel like the movie and the game worlds start to merge. How do you think that plays out?

Thomas Iljic: I think what—I mean so, for example, we have the Genie model. That’s been really interesting. So you give an image, and you can kind of move your character and the world builds in front of you. But what’s going to be really interesting is how do you ground it? Like, games are fun because there’s very set constraints. Movies are good because there’s very small details that matter, you know, the expression and the moment and the timing. And so I think it’s all about—it’s almost about the constraining of the capabilities towards what we need. So I don’t know, I think—and the other thing that strikes me and I think a couple of people on the team is like, it’s not clear that—we think in terms of these static formats that we have today, like an image, a video and a game. Is there something in between, almost? And what does that mean? And kind of where is that going to be distributed and interacted with. Like, I can share an image with you, but you can instantly turn it into a scene that you’re walking into.

Sonya Huang: Yeah.

Thomas Iljic: So am I sharing an image or am I sharing an experience? Lots of questions, I guess.

Sonya Huang: It does feel like, you know, the story is almost the common thing that makes a game and a movie good. And that’s different from an image is just a visual, right?

Thomas Iljic: Yeah, exactly. The setting, the constraints. You define the rules of the game, basically, and then you let other people enjoy themselves in it.

Sonya Huang: Really cool. My understanding is that video is still expensive and somewhat slow to generate. Is your sense that that’s getting solved quickly and, like, will we have—everybody’s going to be able to generate two-hour films in their pocket in a couple years time, or is your sense that this is a, you know, longer—we got a lot of efficiencies that we need to build in order to make this kind of cost practical.

Thomas Iljic: I think—I mean, we’ve seen in imagery and we’ve seen in video kind of like the same speed of cost reductions that we’ve seen in other places. Both, you know, the hardware is getting better, I think the efficiency, to your point, we have, like, the regular models, and then we learn how to distill them so that they just take less processing to get to whatever you asked for. So I’m actually pretty optimistic that the costs are just going to keep coming down and the speed is going to increase, kind of aligned with what we’re seeing with other models.

Sonya Huang: Yeah, got it. Fantastic. What do you think is ahead for AI in the creative space, at least from a Google Labs perspective?

Thomas Iljic: [laughs] Well, we just launched Flow, so we have a lot of things to do to just like, you know, deliver on that promise of keeping you iterating. I think that’s the first thing.

Sonya Huang: Yeah.

Thomas Iljic: Refinement of, like, outputs and keeping there, going there and, like, insertion, editing, reshooting, I think is really interesting to us. But I think the holy grail will be some of these new formats and experiences. Like, what does it mean as a creator to share something with you that you can interact with? That’s something that we want to explore.

Sonya Huang: Really cool. I want to be able to talk to Will Smith as he’s eating the spaghetti. [laughs]

Thomas Iljic: Maybe he will.

Sonya Huang: Really cool. Thank you so much for sharing what you all are doing over in the creative sphere.

Thomas Iljic: Of course.

Sonya Huang: Okay. Jaclyn.

Jaclyn Konzelmann: Yes.

Computer use with Project Mariner

Sonya Huang: I would love to talk about computer use, and Mariner. Maybe first off, why is it called Mariner?

Jaclyn Konzelmann: Great question. So we wanted to give the project a name that really embodied what we were trying to do with this space, which was enable users to just go out and explore—enable agents to go out and explore. And Mariner is sort of this whimsical, open-ended name that just sort of embodies the spirit that we have on the team right now.

Sonya Huang: I love that. Actually, you guys have really good product names across Google Labs. These are all really whimsical.

Simon Tokumine: I’m still trying to get rid of the LM bit.

Sonya Huang: [laughs]

Simon Tokumine: Apart from that.

Thomas Iljic: I’m pretty happy with Whisk and Flow. I think we did decently there.

Simon Tokumine: We’re evolving our approach to naming.

Thomas Iljic: That’s what we evolved at I/O: naming. There we go. It’s a statement improvement.

Sonya Huang: Yeah, that’s funny. Can you say a little bit about how Mariner works? Like, is it computer vision model behind the scenes? Like, it just feels like pure magic in a box. But give us a peek under the hood.

Jaclyn Konzelmann: I will take pure magic in a box any day. So the way it works is really leveraging the power of Gemini. That’s kind of—you know, it’s an action-tuned model on a recent version of Gemini. But what that means is that we have all of the multimodal capabilities that Gemini gives us. So it’s able to plan and reason when a user enters in a task. We’re able to understand that; we’re able to come up with a plan on how we should actually fulfill that task. And then the way it actually works is taking that and understanding the screenshots. So this is where the multimodality of the Gemini model really comes in handy. We’re able to continue to take screenshots, continue down the trajectory of what it is that we’re trying to achieve from the user’s tasks that they gave us and bring it all together that way.

Sonya Huang: Yeah. Got it. Super interesting. What’s the history of the project, and when do you anticipate you’ll be rolling it out en masse?

Jaclyn Konzelmann: So the project initially started last year, shortly after this time, actually, if we go back at I/O last year, we kind of graduated the Google AI Studio and Gemini API out of the Labs team onto the developer team now. And that freed us up to start exploring what we thought was coming next. And that happened to be agents that could actually take action on behalf of users, not just answer questions or generate content.

So the team started working on it at that point. We started grouping up with a bunch of different folks across Google to kind of bring together what we launched in December last year, which was Project Mariner as a Chrome extension that took action on your browser. And then we continued to iterate on it based off of a lot of the feedback that we got from the trusted testers of that initial launch. So we actually had a large group of trusted testers that we would be talking with regularly and understanding what was working well for them, what wasn’t. And we took that feedback and iterated on the most recent launch of Project Mariner, of which we announced last week at Google I/O.

Sonya Huang: Really cool. What was some of the feedback? And, like, what do people—what are the magic sparks when people really are like, “This is a game-changing product for me?”

Jaclyn Konzelmann: Yeah, great question. So it’s funny, one of the initial kind of magic moments that everybody had was watching Project Mariner take control of the mouse on the browser and being able to click, scroll. Typing text into text boxes actually felt net different when you realized it was an agent doing it. But quickly as you were using the initial version, the feedback became, “This is super cool. Can I please use my browser again? Like, I’d also like to be able to do work.”

Sonya Huang: Yeah.

Jaclyn Konzelmann: Which makes a lot of sense. And so that was one of the big motivations behind moving towards this idea of users entering a task in the web app that could then run in the background on virtual machines.

Sonya Huang: Okay.

Jaclyn Konzelmann: Exactly. But one of the key things that we did also try to keep true to the initial vision was how can we start to think about bridging the context that a user had on what they were doing in their current environment to the tasks that they were sending to the VM and Mariner executing in the background? And the way we tried to do that was if you install the companion extension now, it’ll actually be able to see all the tabs you have open. So when you’re giving Project Mariner a task, let’s say you happen to be looking at a recipe on a recipe site and you’re like, “Oh, wouldn’t it be great if I could—” canonical use case “—add all these ingredients to my Instacart cart?”

Sonya Huang: Yeah.

Jaclyn Konzelmann: Now when you go to Project Mariner, you could say, “Hey, add all the ingredients from this chicken recipe to Instacart.” And you can select the tab that you have open with that chicken recipe, and Mariner will understand that context, will be able to revisit that site on the VM and complete the task with the context that you had in your local browser as well.

Sonya Huang: And it’s almost superhuman in a way because as a human I only—it’s, like, hard to contact switch between browser tabs.

Jaclyn Konzelmann: Yes. [laughs]

Sonya Huang: And you’re able to kind of see everything in the tabs all at once.

Doing 10 tasks at once

Jaclyn Konzelmann: Yeah. I think a big net win also was the ability for Project Mariner to do 10 tasks at once, not just one. And that was really a big net unlock. I was using it the other day and, you know, Id just come back from running an errand, and there was a bunch of stuff on my mind that needed to get done. And the first thing I did was open up Project Mariner, enter in three different tasks for it, and then just sent them off to start making progress. And I was able to jump back into the document that I happened to be working on. And it was this, like, magic moment of just okay, not only is progress being made on these things, but I just got it off my mind. Like I didn’t have to keep thinking about it.

Sonya Huang: Do people want to see the computer mouse moving around first for a while before they’re like, “Okay, I trust that thing to go off and do things for me?”

Jaclyn Konzelmann: If they do, they have that opportunity in the current Project Mariner experience. You can go into full screen mode. You can see the agent moving around and clicking on things, entering text. You could also pause the task at any point and be able to take over it. So having or giving the user the ability to take over and/or provide oversight on these tasks is something that we think is still very important when we have an open-ended platform like this or an open-ended experiment like this that it really leaves it up to the user to try out different things.

Sonya Huang: And what’s the user behavior you’re seeing? Like, are they like, “Please just take the wheel. I don’t want to deal with it.” Or they actually want to, you know, backseat drive and watch the agent and make sure it’s doing what it’s supposed to be doing?

Jaclyn Konzelmann: That’s a great question. I think initially watching it is this fun element, but also it develops a comfort for knowing how the agent is thinking and what it’s doing. But one of the pieces of feedback we also got from the initial launch was at the end of a task being complete, we just save the entire conversation history. And it can get quite long. And what users ended up wanting was just a summary of, like, what did Project Mariner do to complete this task so I can make sure it did it correctly? And that really kind of points to the question you’re getting at, which is I want to just hand the task off to this agent, but then I want to be able to just verify what it did at the end of the task, not sit there the entire time and watch it.

Sonya Huang: Yeah. Yeah, so interesting. What do you think are the solved and the unsolved technical problems so far with computer use? Because computer use still feels like to me we’re maybe in the Will Smith, you know, the spaghetti is still sort of disappearing a little bit phase. And maybe that’s an unfair characterization, but I’m curious where you think we are on the evals and the performance so far for computer use, and what are the unsolved problems right now?

Jaclyn Konzelmann: I think that’s actually a totally valid comparison. There’s a reason we launched this as a research prototype with the experiment label on it right now. I think we’ve seen really big gains from December to what we launched last week. That said, there’s definitely still model quality improvements to go. I think there’s also just application-level improvements to go. There’s more seamlessly being able to have the user, you know, provide context up front, which will make the agent more capable of understanding what it is it should be doing. And then there’s just more planning and reasoning that we could do, like, at inference time or at the application layer time that sort of, in addition to the model improvements, you know, improve system instructions, improve checks and calls to different models.

And then of course, right now, Project Mariner entirely completes a task by actuating or taking action on a browser. You want an agent that has more skills than that. You want an agent that knows when to call the right tools, that has memory, that’s able to, you know, take advantage of a lot of the other stuff that we already see out there. So I think it’s just integrating a lot of that in and starting to innovate and climb on that. And then, of course, right now Project Mariner, it’s in the browser. People use computers. So, you know, we call this “computer use.” So there’s that entire dimension as well that I think we’re gonna continue to see innovations in.

Sonya Huang: Really cool. Were there any contrarian opinions you all took in building Mariner? So for example, I think some people would have said screenshots, it’s gonna be too slow, it’s not gonna be fast enough. You should use the website DOM or whatever. Like, any contrarian bets you guys made?

Jaclyn Konzelmann: So the reason we went with the screenshot is we wanted to make sure that it was a skill that we could develop that could be applied across things that aren’t just websites. I think the other aspect of that is, like, DOM versus accessibility settings or accessibility trees is another leverage. We’re kind of betting on this one right now, but I would say everything’s evolving, so we’re just willing to take pivots if and when it makes sense.

What can Mariner do today?

Sonya Huang: Yeah, makes sense. What is it capable of doing today? And what is the speed? Like, if I tell it to go—you know, the canonical “Go order me a pizza from Domino’s,” can it do that? And how long does it take?

Jaclyn Konzelmann: The speed is definitely an area that we want to keep hill climbing on is what I would say. But it’s interesting you say that because one of the things that—so I was recently using Mariner to help me complete the task which was come up with—let me take a step back. I have a three year old at home. She is going to be four soon. Part of that means organizing a birthday party for her, and being able to figure out loot bags for kids at a four year old’s birthday party. This task, as you can imagine, involves understanding what to put in the loot bag and then actually buying all of those things or finding links somewhere to go buy them.

Sonya Huang: Yeah.

Jaclyn Konzelmann: And I gave Project Mariner this task, and it was basically a personal research that turned into an action-taking task, which is find me the links and save them. And the thing that really resonated the most with me on that one is as it was performing this task, first it did a search for good ideas to go in a loot bag, and then as it just remembered those five items—that’s something any of us could do, like, that itself wasn’t impressive—but the first one was, I think temporary tattoos.

So then it started looking for temporary tattoos. It found a great link for it. Instead of having to copy that link and paste it in a doc somewhere else, it could just remember it. It could remember this, like, massive URL. And then it moved on to the next one. And then at the end of these five items, it just gave me all five URLs that it had been able to inherently store. So when we talk about speed and efficiency, I think there’s two dimensions. One is just the model calls and the, you know, taking action and, like, how do we improve it with different tool use? But then the other one is how can agents just do things in a different way that are inherently faster than the way we would do things? And I think we’re going to continue to see improvements on both dimensions.

Sonya Huang: Yeah. I wish I could remember five URLs.

Jaclyn Konzelmann: Oh, gosh!

Sonya Huang: [laughs] Okay, good point. Let’s see. What do you think is ahead for Mariner? Where do you see it evolving from here?

Beyond the browser

Jaclyn Konzelmann: I think there’s a couple things. Number one, we had a bunch of announcements last week around Project Mariner-like capabilities making their way into different Google products. And I think that this is a kind of core capability that you’ll start to see emerge everywhere from the Gemini app to AI mode in search. So I definitely see a lot more coming to Google products with the stuff that we’re doing right now in Project Mariner, and kind of paving that path forward.

And then I think for Project Mariner itself, I actually like to think of things in three categories. There’s the agent itself. I think that’s going to get smarter, that’s going to get better, that’s a better model, that’s tool use, that’s memory, that’s context. Then there’s the environment. We talked about how in December it operated on your local desktop in your Chrome browser. So that’s in the foreground.

Then we moved towards this idea of Project Mariner operating in virtual machines, which meant that it’s now operating on VMs. I think there’s this middle layer, which is an agent that can still operate on your device but in the background. And there’s a bunch of reasons and types of tasks where that becomes a really important kind of way for the agent to operate. And then, of course, there’s all the other devices. But really what you want is a capable agent that’s able to operate in a way that is omnipresent across all your devices, locally on VMs.

And then the last one is the ecosystem part, which is where you start to get into the agent-to-agent interaction and, like, how does your agent interact with all of the things that exist outside of its own world, essentially?

Sonya Huang: Yeah. So cool. I think the canonical examples for computer use are, you know, “Book me a flight,” or “Order me a pizza.” Is that your sense of what computer use agents will actually be really good for or, like, what do you think? I’m sure you’ve spent a lot of time thinking about what applications will actually be the bullseye here. How do you think that shakes out?

Jaclyn Konzelmann: So I think we default to those because they’re just easy to understand. The travel planner, I mean literally it’s a travel agent. It couldn’t be more analogous when you think of agents right now. But no, the way I like to think about it is on a spectrum where you have tasks that are sort of in what I would consider “do it with me,” where you have your agent alongside, and you can easily offload certain tasks to it, but it’s really working in unison with you.

And then you have these, like, do-it-for-me tasks which is, “Hey, I just want to give my agent a bunch of stuff to go do,” and it will run it in the background. I think part of the reason we see these tasks being used is twofold. One, they’re just incredibly easy to understand, and everybody kind of gets what that use case is. And they’re usually starting from scratch. Like, there’s no context you need up front. You can just send an agent out to go do it, and the demo as a result is pretty easy to put together.

Sonya Huang: Yeah.

Jaclyn Konzelmann: And then the other one is just where the capabilities are at today. And so as agents get more capable and you start to have more of these realizations on what they are actually able to do, you’ll see much more advanced use cases or much more complex use cases. And that also requires the user having more trust that they can give to the agent. So I think that that will evolve over time, and we’ll see people come up with even more interesting use cases that they’re willing to give an agent to do on their behalf.

Sonya Huang: Yeah, totally. It’s also going to require—I guess it’s going to inspire, I think, a shift in business model, right? Because if you have a bunch of agents going off and browsing, you know, trip planning, for example, they’re not necessarily looking at the ads and, you know, the first things that show up. And so I think it’s going to create some business model evolution as well.

Jaclyn Konzelmann: I agree. I think there’s a lot of evolution that’s going to happen across business models, across how websites work, across how, you know, users will always want to use the internet going forward. Like, there’s a lot of joy I think we all get in it, from content creation to consumption, but there’s also a lot of other tasks that it’s just ripe for disruption in a lot of ways.

The future of shopping

Sonya Huang: Yeah. Yeah. Like, I’m thinking humans are suboptimal in some ways. We see the ads, we get excited, distracted and, you know, I go and buy the dress. And my agent, maybe I can instruct it to ignore the ads. Maybe it actually knows, like, it’s going to find the best content regardless of what’s showing up on the page. So it’s kind of interesting to think about how that future plays out, you know, as agents do more of our browsing.

Jaclyn Konzelmann: It’s super interesting. I will say that the dress that, you know, maybe you got distracted. I always get distracted by things too, and end up purchasing stuff that gets sent my way, but I’m always happy with it by the time I do end up purchasing it.

Sonya Huang: Totally.

Jaclyn Konzelmann: So I think that there’s, like, new opportunities to think about how do you actually involve agents in this new sort of business model ecosystem? And hence that third bucket of, like, there’s going to be a lot of evolution happening in that space. And I think that that’s where we need to evolve as an entire ecosystem. And it’s not just, like, one player that’s going to say, “This is how it’s done.” So it’s been interesting just talking to different companies and different people who are also thinking in that space right now.

Sonya Huang: Yeah, really cool.

Simon Tokumine: I mean, I do think also, just as a user as well, you know, I often don’t buy things on the internet because it’s such a pain.

Jaclyn Konzelmann: Oh, I’ve definitely dropped off. [laughs]

Simon Tokumine: I cannot. I can’t navigate this thing. Either I don’t understand it.

Sonya Huang: Yeah.

Simon Tokumine: That happens quite a lot. Or it’s just like I’ve just not got time. That happens as well. Or I’m just—I can’t be bothered, you know?

Sonya Huang: Yeah.

Simon Tokumine: Maybe it’s just me, but I’m not a fan of shopping, let’s put it that way—in the real world and online. But I’m a fan in what I get, you know? I’m a fan in the outcome. And so I don’t know, I kind of feel like I might do more—I would probably do more online shopping, I think, you know, if I didn’t have that barrier of actually having to do the shopping bit. I don’t know. That would be me, though.

Jaclyn Konzelmann: No, I agree. And what’s interesting is I don’t know about you, there’s certain stores that I’ll go on to and I’ll just, like, accumulate stuff in my cart, and I won’t want to, like, pull the trigger until a little bit later on when I’ve had a chance to think about it.

Simon Tokumine: Yeah. Yeah, that happens.

Jaclyn Konzelmann: But then I end up with a bunch of, like, half-built carts across a bunch of different websites. And part of me also wonders, like, is there a world where my agent is that universal cart essentially, where I’m, like, add all this stuff to it or, like, create this aggregate area of all the items that I might be interested in buying. And it can be across any site at this point because the agent represents me and it can remember which sites to go on. And then when I’m ready, it’s sort of like, okay, one click, like, make this entire purchase, basically. And it can go and check out on all of the different sites or all the different stores. So that’ll be an interesting area to think about.

Sonya Huang: Yeah. Okay, what I just heard from you guys is e-commerce conversion is about to skyrocket, then.

Jaclyn Konzelmann: [laughs]

Simon Tokumine: I mean, on my computer it will go up. That’s all I’m saying. I don’t know about anyone else. Also diversity as well. You know, like, I go to the same old sites, right? But I would love suggestions.

Sonya Huang: Yeah.

Simon Tokumine: You know?

Sonya Huang: Yeah, it’s like once you’ve kind of democratized computer use, then the laziness of humans to get through checkout is no longer the determining factor of which e-commerce companies will do well. It’s just like the best product wins. Yeah. So interesting. Okay, cool. Thank you for sharing.

Jaclyn Konzelmann: You’re welcome.

Sonya Huang: Okay, Simon? You’re last.

Simon Tokumine: Hi.

NotebookLM

Sonya Huang: Hi. Notebook. NotebookLM or Notebook?

Simon Tokumine: We’ll go with NotebookLM. We’re still Notebook. I think it’s been so long now that it’s definitely NotebookLM. There was a period where we were like, “Okay, is now the time?”

Sonya Huang: Yeah.

Simon Tokumine: You know? But I think we’ve gone through that multiple hockey stick moments, which we can talk about. And yeah, it’s going to be hard to remove it. I like it, though. I mean, you know, maybe every product that kind of like has an acronym or some weird letters after it—and there are a couple of them in the AI space—regrets that, but at the same time they become part of the team and the identity.

Sonya Huang: Totally.

Simon Tokumine: Yeah, it’s nice. I like it.

Sonya Huang: I love that. Okay, so NotebookLM was, you know, one of Google’s biggest viral hits last year?

Simon Tokumine: Last year. Yeah, yeah. It went viral last year.

Sonya Huang: Yeah.

Simon Tokumine: Yeah. But, you know, the team had been building it for a while before it took off.

Sonya Huang: Totally.

Simon Tokumine: Yeah.

Sonya Huang: Tell me about how it’s evolved in the last year.

The viral moment

Simon Tokumine: Yeah, yeah. Well, so firstly the viral moment. You know, so my way into NotebookLM was through audio overviews. So me and the team had a kind of—we were also exploring the future of content, but from a different angle, I think. And Notebook was the perfect balance of kind of user control, but also kind of the power of the technology.

And our hypothesis was that there was an opportunity for personal content. So not content that is for everybody, actually. Content that’s for an audience of one, maybe two, maybe three. Small group maximum. And that was kind of, you know, how we shaped the product. We didn’t think it was going to—you know, looking at the Notebook user base back then, we thought that it was a great place to, you know, kind of like test PMF, just kind of iterate on the product. But we were totally unprepared for the massive success of audio overviews. And then through that, NotebookLM as well.

So it was honestly, the first couple of months was really just kind of hanging on for dear life. Firstly, it was making sure that the TPUs don’t fully melt. [someone] had a GIF out back then.

Sonya Huang: [laughs]

Simon Tokumine: But there was also just a lot of iterations and fixing things and improving things. And that was really the first couple of months. I think since the start of this year, maybe we’ve managed to take stock. So at the end of last year, we launched the Join mode, the ability to join in a podcast—in audio overview, I should say, and talk with the host and ask questions and all this kind of stuff. But at the start of the year, we kind of took stock, and we’ve really been thinking about what is a notebook for the Notebook users? How are Notebook users really leaning into notebooks once they’ve come in the front door through audio overviews?

And we’ve started to think about—and Jaclyn, you kind of touched on this, I think—the criticality of context in really enabling these AI systems to be genuinely useful for you. And we found that a lot of users, when they’re using Notebook, they use them for these kind of more longer running, almost like projects that they have. So either their hobbies or if they’re in the world of work, they can be ongoing projects or they can be projects with a goal, you know, like I’ve got to prepare for a presentation or something like that.

And so a lot of what we’ve been really doing is retooling how we look at Notebook, and also building a strategy as well that leans more into, I think, those more sort of longer-running opportunities that we see in the Notebook user data. Of course, we’ve done a whole bunch of kind of improvements too. So we’ve just launched the mobile applications finally. So they came out last Monday. And we also launched international audio overviews as well, which was kind of the end—it was the end of a long road, honestly, of upgrading the underlying AI infrastructure and models away from the very first almost like research-grade model that we used for the initial launch, to native Gemini Audio. So what you hear now in the international Audio Overviews at the very least is native Gemini Audio. And that was a big push for many teams across Labs and also GDM.

Sonya Huang: Yeah, super cool. It feels like Audio Overview was, like, almost the viral hook. And you guys have been building out a lot in almost like the RAG UI, and just imagining what that workspace looks like.

Simon Tokumine: Yeah.

New form factors for knowledge

Sonya Huang: What do you think the actual just Audio Overview podcast thing becomes? And actually I’m curious how you even ended up on the shape of two podcast hosts talking to each other. It’s just like—it’s such an engaging format. I’m curious how you even landed on that. And, you know, I feel like it’s only in its infancy still in terms of—I would love, you know, podcasts every morning to hype me up for my day and things like that. And so how much of your time is thinking about Notebook, the kind of RAG workspace environment, for lack of a better word, versus Notebook, the podcast killer, you know, the training data is going to be built on Notebook in the future?

Simon Tokumine: Yeah. Yeah, yeah. [laughs] Well, I hope not, but maybe it can help. So the way that we’re starting to increasingly look at notebooks is they’re comprised of kind of three—they give you sort of like three superpowers. So one of them is they help you really accumulate information over time. And, you know, there’s a lot of amazing underlying database technologies that we apply that I think lean on first party Google technologies in a pretty unique way.

The second is they bundle in intelligence. And when we launched last year, we used the old Gemini 1.5 Pro model back at that time, but obviously now we’ve got thinking models and so on.

But the third thing is this ability for content and information to be adaptive to your situation. And so podcasts or audio overviews, a conversation, it’s one form that information might take, but you can imagine many other forms that that information or knowledge might take as well.

Sonya Huang: Hmm.

Simon Tokumine: So you might imagine it coming at you in the form of a comic book, or maybe a short movie or maybe a mind map, which we’ve also launched. But you can imagine many other types of media that fit the right circumstance and form and function for the moment for you to understand information, to be able to analyze it, make decisions with it, kind of do something with it.

So that’s kind of the mindset that I think we have when we’re thinking about the different—you might hear us talk about “transforming” information from one state to another. I think that’s a fine word. It’s a little bit technical, to be honest. It’s more like adapting to you and fitting you, I think. That’s really what we’re going for.

But in terms of just going back to your actual question around Audio Overviews and, you know, where it’s going, there is a huge amount, I think, of room left in that technology. So, you know, I enjoy audio overviews and I use them a fair amount. But I also—you know, every now and then I’ll be like, “That’s weird. You know, why do they say that?” Or “They’ve kind of lost the plot there. I didn’t quite get the right narrative.” Sometimes it’s like the uncanny valley or the illusion is broken, you know, when you’re listening to them. And, you know, while it might seem like there’s a small amount of work we might need to do to kind of fix that last step, there’s actually a ton of work that we’ve got to do, you know? And so there’s a lot of effort being placed into all of the various components that you’ll need to make the experience feel like something that, you know, where you suspend your disbelief more completely.

And alongside that, you know, there are many other different show types. We’ve kind of had one show type for a bit too long, I think, actually. And we’re bringing more out. So we’re actually working on some really cool things—a lot of them inspired by users, honestly.

So one of the things that we saw users do right back at the start, but you keep on seeing it, is users putting in their LinkedIn. You know, they’re putting in their LinkedIn. Well, why? Well, number one, it’s kind of fun to, like, hear people talk about you, but a lot of users are using it to get feedback, you know, like, to kind of understand from another person’s perspective who they may not have access to. You know, feedback is truly a gift. Like, real feedback is hard to find, you know? So, you know, how would somebody else look at me? How would somebody else talk about my strengths? And how might somebody else talk about areas to improve?

You know, this is something that we see users already using audio overviews to kind of access that sort of content or sort of information. We think we can make that easier for people, you know? So a lot of what we’re thinking about now are different show types that lean into some of the more viral successes that we’ve seen our users, you know, explore online, and also think about, you know, brand new formats as well that I think are going to be fun.

Sonya Huang: Okay, so we’re going to have Training Data, the comic strip?

Simon Tokumine: I’m not saying we’re definitely going to have it, but I mean, it’s—I think not everything has a story, you know? And so applying different adaptations will almost be sort of context dependent, I think.

Sonya Huang: Yeah.

Simon Tokumine: But oftentimes it does help, you know? So one of the things that we were looking at the other day was we were looking at 150-page PhD dissertation on—it was invasive wolves, I think, in some part of Europe. And yeah, you could have looked at a mind map. You could have maybe listened to an Audio Overview if you had, like, 10, 15 minutes to spare. But actually getting a kind of a comic book rendition of that PhD was really helpful just to kind of understand the overall narrative within it.

Sonya Huang: Yeah.

Simon Tokumine: So, you know, we’re still working on things like that, but I think there’s a lot of opportunity there. And of course, you know, comic books are very similar to storyboards.

Sonya Huang: I was just thinking exactly that. Yeah.

Simon Tokumine: And that intersection is what Thomas is doing too, as well. So yeah, there’s a lot of interesting ways that I think Labs projects intersect, and we’ll continue to explore them.

Sonya Huang: So you can create a hero’s journey comic book of somebody’s LinkedIn career arc. [laughs]

Simon Tokumine: I mean, for an audience of one person and one person only, that’s probably going to be the most awesome movie that they’ll have ever seen. So maybe. Yeah, maybe.

Sonya Huang: That’s awesome. Really cool. Where do you see Notebook going from here?

Adapting to users

Simon Tokumine: Yeah. Well, like I said, I think our focus is, aside from a whole bunch of different adaptations, we’re really thinking about how we can be more useful to our users over their more longer-running projects. And so both in the world of the knowledge worker, but also in the world of students. These are our kind of like core users, I think. The project is really an area where those users both need the most assistance, but it’s also the point of highest value, I think, for them, right? So if you’re in the world of work, the project is where value accumulates.

Sonya Huang: Yeah, the atomic unit of work.

Simon Tokumine: Yeah, right. It’s a real unit of work. We actually call them “units of knowledge,” but it’s a great way of putting it. And the same for a student as well. You know, the project, if it’s a project with a goal, passing a test, that’s a big deal, you know? Or if it’s an ongoing lifelong learning thing, that’s also really important as well. So I think really focusing on use cases in those domains is something we’re thinking a lot about.

I’ll say the other thing is, you know, I think one of the things I’m personally very excited about, you know, I’ve been in the consumer product space for many, many years, and one of the—I guess one of the things that we did at Google when we went kind of mobile first in the mobile-first era is we moved a lot of our desktop products to mobile. And if you look at those mobile products, many of them are the desktop products shrunk down to a small screen.

Sonya Huang: Yep.

Simon Tokumine: And that’s okay, you know? And I think because we were one of the first, because we built Android and a lot of our big products basically got mobilified at that point, we found it hard to change at that point forwards. But I’ve always been really interested in thinking about if you have a desktop experience, what is a companion mobile experience that doesn’t have to just be a carbon copy of the desktop experience? That maybe leverages the form factor, the sensors, the fact that it’s with you all at all times, to deliver an additive experience on top of the desktop experience?

So, you know, we’ve just launched the mobile experience after a fair amount of time in development, it’s fair to say. But what I’m really most excited about there is the opportunity to actually iterate on that kind of novel mobile experience going forward. For example, wouldn’t it be cool if, you know, maybe I’m in a discussion with some amazing, really smart people, and I’ve popped Notebook down. I’ve opened its native voice recorder, and it’s just able to record the conversation for me, and then I can transform that to later dates and accumulate them and all this kind of stuff. That’s the thing that is probably going to be weird if I open my laptop and push record on my laptop, but for the mobile device, it’s the perfect opportunity.

Sonya Huang: Totally.

Simon Tokumine: Yeah.

Sonya Huang: Really cool. Thank you for sharing.

Simon Tokumine: Yeah, no worries.

Lightning round

Sonya Huang: Okay, we’re going to close it out with some predictions on AI as a whole. Please jump in. Hot takes welcome. Let’s see. Let’s start with what are your favorite Google Labs projects that we didn’t talk about today? What are the gems right now that you’re most excited about?

Simon Tokumine: The unreleased Google products that we’re not allowed to talk about?

Sonya Huang: [laughs] Not the unreleased, but you guys just announced, like, 50 things. There has to be others beyond the three we talked about today.

Simon Tokumine: Yeah. Yeah.

Thomas Iljic: I have one which is kind of still in this, like, video and image space, but I think the virtual try on stuff that you presented, like, there’s a lot of exploration in it. I think that one to me is really nice because I think it meets a real direct user need. It’s the strength of Google, obviously. We know we have all the inventory and we know how to connect this. And it’s just so fun to just see things on your—so I’m very excited about that one. I think this has like a …

Sonya Huang: That’s my favorite as well. That’s so funny. Okay.

Thomas Iljic: I think that one’s a good one.

Jaclyn Konzelmann: Stitch, I think is really cool to be able to just talk to the product and describe what design you want, and have it actually come out with that front end design. I’d been using it a little in dog food before it was launched, and so it’s just—I want to spend more time using it now that it’s actually live.

Sonya Huang: Really cool. What about you?

Simon Tokumine: Well, mine was going to be Stitch.

Jaclyn Konzelmann: [laughs]

Simon Tokumine: So I’m going to have to think.

Sonya Huang: Two votes for Stitch, one vote for shopping.

Simon Tokumine: Yeah.

Sonya Huang: I’m with you. Two votes for shopping.

Simon Tokumine: Yeah, there we go. There we go.

Sonya Huang: I guess what areas do you think will be hottest in the application space for AI broadly in 2025? Like, I think coding was, you know, maybe the breakout application in the last 12 months. What do you think will be the breakout application in the next 12 months?

Jaclyn Konzelmann: Video. [laughs]

Thomas Iljic: Yeah, I think there’s, like, something around, like. these remixable content.

Sonya Huang: Hmm.

Thomas Iljic: You know, you generate something, I take your thing, I just riff off of it.

Sonya Huang: Yeah.

Thomas Iljic: There’s something around this that I think is going to pop up somewhere. I hope it’s us, but that part feels really interesting. It’s kind of like, you know, Whisk is heading a bit that way. Veo obviously can power a lot of this in video. I think that’s going to be something this year.

Sonya Huang: As you look back at past predictions of what you thought was going to be interesting in AI, where have you guys been really right and where have you guys been really wrong?

Simon Tokumine: Let’s say where we’re really wrong altogether. Three, two, one. Timing.

Thomas Iljic: Timing.

Jaclyn Konzelmann: Timing. [laughs] I think there’s been several examples where we definitely felt like we were onto something. And we were onto something; we were just too early into the space. And so it’s been fun to see, like, projects kind of go on pause or, you know, stop for a little bit, and then some of them are starting to even come back around again at this point. And so sometimes we just were a little too early, but it just gives us a jump start when the models and the capabilities are ready.

Sonya Huang: Good problem to have.

Jaclyn Konzelmann: Yeah.

Sonya Huang: What do you think you’ve been really right on in, like, sticking to your convictions on?

Thomas Iljic: I think this, at least for me, in my space, like, the show and tell piece. This idea that, like, you shouldn’t ask users to kind of write two pages of text to describe, for example, an image. The idea is like, you should just be able to show and tell like you would do a friend or an artist that’s working with you. I think that has stuck, and it’s kind of moving people away from prompting and towards kind of instructing and relying on the intelligence that lives behind. So I think that one. That one, I’m sticking with my guns and I think it’s there to stay.

Simon Tokumine: Yeah. I mean, this is pretty obvious at this point, but when we all started in Labs, there was no Google LLM API. Google didn’t have a functional instruction-tuned language model or anything like that. And believe it or not, back then, in fact, I think the general consensus was that these were not really things that are easy to build a business around because of their cost. And I think one of the things that we’ve all done actually is we’ve kind of stuck with the technology. And now it’s obvious, right? But in the early days it wasn’t. It certainly was not obvious.

Sonya Huang: Totally.

Simon Tokumine: So yeah. And we got that bit of timing right.

Jaclyn Konzelmann: Yeah.

Sonya Huang: Inference costs just riding that curve and just capabilities up, costs down. And what will you build, assuming that those curves continue?

Simon Tokumine: Yeah, exactly. In fact, when we joined, one of the traditions is, like, to think inside Labs that Josh started, actually. And a lot of the docs that we’d write were around, “Well, what happens in two years?” You know? And of course, that curve is something that I think inspired a lot of us.

Sonya Huang: Yeah.

Simon Tokumine: Yeah.

Sonya Huang: Fantastic. Thank you all so much for joining to share what you’re doing across the creative sphere, the, you know, computer use sphere and the—what do I call the notebook sphere? The podcast killer? [laughs]

Simon Tokumine: Let’s not say “podcast killer,” but yeah, we can say knowledge.

Sonya Huang: Knowledge creation, transformation space. It’s just really, really cool what you all are building, and you guys have such a cool job getting to kind of cook in the little test kitchen of Google. And thank you for giving a preview of some of the stuff that’s coming down the pipeline.

Jaclyn Konzelmann: Thanks for having us.

Sonya Huang: Thank you.

Mentioned in this episode

Mentioned in this episode: 

  • The Not-So-SuperVillains Episode 0.5: Potential Disaster: 2D animation pilot from Google Labs
  • Whisk: Image and video generation app for consumers
  • Flow: AI-powered filmmaking with new Veo 3 model
  • Project Mariner: research prototype exploring the future of human-agent interaction, starting with browsers
  • NotebookLM: tool for understanding and engaging with complex information including Audio Overviews and now a mobile app
  • Shop with AI Mode: Shopping app with a virtual try-on tool based on your own photos
  • Stitch: New prompt-based interface to design UI for mobile and web applications.
  • ControlNet paper: Outlined an architecture for adding conditional language to direct the outputs of image generation with diffusion models