Building the “App Store” for Robots: Hugging Face’s Thomas Wolf on Physical AI

Training Data: Ep62

Thomas Wolf, co-founder and Chief Science Officer of Hugging Face, explains how his company is applying the same community-driven approach that made transformers accessible to everyone to the emerging field of robotics. Thomas discusses LeRobot, Hugging Face’s ambitious project to democratize robotics through open-source tools, datasets, and affordable hardware. He shares his vision for turning millions of software developers into roboticists, the challenges of data scarcity in robotics versus language models, and why he believes we’re at the same inflection point for physical AI that we were for LLMs just a few years ago.

Listen Now

Stream On

Summary

Thomas Wolf, co-founder and Chief Science Officer of Hugging Face, brought the same prescient vision that led to Hugging Face’s early investment in transformers to robotics through the company’s LeRobot project. He emphasizes democratizing robotics development through open-source tools, diverse hardware approaches and community-driven innovation—mirroring the successful formula that made Hugging Face the largest open-source AI community.

Building communities unlocks exponential growth: Hugging Face’s success in robotics mirrors their transformer strategy—creating accessible tools that transform niche specialists into a broad horizontal community. Their robotics community has grown exponentially to 10,000 developers, proving that providing simple Python-based tools can democratize complex fields and enable software developers to become roboticists.

Diverse form factors beat expensive humanoids: Rather than pursuing costly humanoids that could price out most users, Wolfe advocates for a “galaxy of different form factors” starting with affordable options like their $300 robotic arms. This approach prioritizes accessibility and enables more experimentation, avoiding the elite-only scenario where only wealthy users can afford $100,000 humanoid robots.

Data diversity matters more than data volume: Unlike LLMs that benefit from massive internet-scale datasets, robotics requires diverse, multi-location data to achieve generalization. The key bottleneck isn’t just collecting more robotic task demonstrations, but ensuring sufficient environmental and contextual diversity so robots can adapt beyond their training environments.

Local deployment drives safety and reliability: Robotics demands local model execution more than other AI applications because physical robots losing internet connectivity could cause dangerous failures. This safety imperative makes open-source models particularly valuable in robotics, where running models “as close as possible to the hardware” prevents catastrophic scenarios.

Open science accelerates innovation beyond model sharing: True advancement requires teaching people to train models, not just providing pre-trained weights. Wolfe’s background struggling to access Soviet superconducting research shaped his belief that sharing training methodologies, datasets and implementation details creates more value than releasing models alone—enabling others to build upon and improve the work.

Transcript

Thomas Wolf: Many, many startups are just already being built on top of LeRobot. Just, you know, they want to build something, they have this idea of a manual test they can automate, or they have an idea of something they could do in the physical world, and then they take LeRobot. They take already, like, the basic building blocks we’ve shipped, which is just a robotic—a very simple robotic arm, the SO-100, that we designed basically to be the cheapest robotic arm to be $100. And they’re already, like, trying to start businesses around this. At the bottom, you know, it’s the Hugging Face ethos, which is you bring all these platforms, all these basic building blocks for people to build really crazy things on top. Robotics is the same goal for us.

Sonya Huang: In this episode, we interview Thomas Wolf, cofounder and Chief Science Officer of Hugging Face, the largest open source community for AI. Thomas has consistently predicted the future, and he’s responsible for why Hugging Face invested heavily behind transformers and language models, enabling the LLM wave a handful of years ago. Now he sees the same opportunity for robotics and physical AI, and is helping to shepherd Hugging Face’s LeRobot project, which brings together policy models, data sets and physical embodiments to help developers everywhere become roboticists across a diversity of use cases and form factors. We’re excited to chat with Thomas about the robotics explosion, the bottlenecks ahead for physical AI, the U.S. versus China in the open model race, and a lot more. Enjoy the show.

Sonya Huang: Thomas, in your role at Hugging Face, you help Hugging Face invest behind moonshots as the Chief Science Officer one or two years ahead of the field. You mentioned to me last time we chatted that, you know, you have a spidey sense that we’re at the same moment for robotics today that we were for transformers and language models a handful of years ago. Tell us what you’re seeing.

Thomas Wolf: Yeah. I think honestly, it started two years ago. I mean, we started our activities in robotics 18 months ago, and I think at this time there was a couple of breakthroughs out of—I mean, these labs, you know, were Stanford and basically these teams were starting to show robots that were able to tie knots, fold clothes, you know, cook, like threw things in the air on a pan and grab them. And basically all of these things we’ve—in one way with very little data, but also with good perspective to be able to leverage some of the world models and things like that that we see, really benefit from internet size data.

So all of these kind of pointed to a short future where robotics were going to work in a way. Like, hardware was already there, and in my opinion has been there for quite some time. But the missing brick was really software that could adapt, that could be dynamic, all of that. And we started to see that, and that’s why we started to work on LeRobot a little bit more than 18 months ago. And I think for us, the success of LeRobot, like, seeing this huge community, for us, the big bet was, can you build a big community in robotics as well? There were small communities of kind of hobbyists or people very seriously building robots for, like, you know, factory lines. But it was kind of like a tiny vertical, in my opinion. The question was: Could you move this tiny vertical to, like, a full horizontal thing? Just like nowadays, every software developer is kind of an AI researcher, almost. They all want to know how LLM works, how you train them. And there’s this very smooth transition where the 100, 200 million software developers are becoming kind of AI aware. And I think there is a potential future transition where all of these people also become roboticists in a way, if you give them the tools.

And so that was kind of our goal. So we started with the software library, and the success of that kind of brought us to also try to go into hardware, which is a big question as well. And that’s why we acquired our first hardware company, Pollen Robotics. And we shipped our—at least we opened the orders for our first robots. That was mid-July, so that’s one month ago, and this went crazy.

Sonya Huang: Can you tell us what LeRobot is?

Thomas Wolf: Of course, yeah. So LeRobot is our attempt of, you know, doing again, the success of Transformers library in the robotic field. So this idea is having kind of a central library that everyone would use, and that would bring in a very simple and accessible and easy way, all the latest technology, all the latest algorithm that people use to train robots efficiently, all the latest data set that they use to train this, and also connect this to actuators, which is the hardware part of robotics. And that’s this intersection and this mix of three aspects: the policy, the models, the data set, and the hardware which we try to combine in LeRobot.

Pat Grady: How does the role of Hugging Face change in robotics? You know, for people building in the physical world, does Hugging Face play the same role or a different role than Hugging Face has played for people building in the digital world with LLMs?

Thomas Wolf: I mean, our goal is to play the same role, which at a very, very high level is building communities and bringing people, you know, in this idea that AI can be open source, and it’s not something you only consume, but it’s something you can tweak, you can train, you can control, you can host where you want. And actually, hosting where you want is even more important in robotics, because in a future where you have robots everywhere, you kind of want a lot of these models to run locally, because if your robots lose connection to the WiFi or something and then run into the wall or maybe, I don’t know, run into your kids, it’s going to be much more dramatic than, you know, just an LLM hallucinating. So safety questions in robotics I think are a good reason you may want to really be able to not depend on the distant API, but have the models as close as possible to the hardware. So I would say our role is maybe even more important for safety and the future of robotics than it is in an LLM.

Sonya Huang: Could you say a word on the size of the community you have at LeRobot? How many people are building, how many people are contributing data sets and things like that?

Thomas Wolf: Yeah, for sure. Should have checked, actually, the latest number, because it’s exponentially growing. So it’s several thousand people. I would say 6,000 to 10,000. One event we did a couple of months ago, we did a hackathon which was worldwide.

Sonya Huang: Love that.

Thomas Wolf: We had a hundred locations in over six continents. So it’s still not a million of people, for sure, but it’s way above several thousand people. And the main indicator for us is we can measure the number of data sets, for instance on the hub. What we see in all of these things, number of community members or data sets, we see kind of this exponential growth, which is I think a very good indication that we are on the right track. And you have to keep in mind that the hardware that’s available right now is still very much kind of a hobbyist hardware. So it’s like 3D printed arms, it’s like that. They’re still wired everywhere. And that’s why starting this summer we wanted to bring much more mass market hardware, I would say. So something that would interest not only the hacker and the people who are used to plug cables everywhere, but also everyone’s families, something that looks much more polished.

Sonya Huang: What’s the persona of developers in your LeRobot community? And I’m curious how it’s the same or different from the people that have traditionally been building, you know, like, controls, classical-based systems.

Thomas Wolf: I would say there are three types of persona. I mean, one is the traditional roboticist; they definitely want to use AI. So for a lot of them, they know how to build hardware, they know what they can use, but they’ve been frustrated by, you know, the limitation of the software stack, or the optimal control models, and all these are very limiting to what you can do. So all of these are people who have really happily joined the bandwagon. And we see the same effect we saw in Transformers, which is many academic labs starting to use the robots because it’s a very nice entry point for all their students. And so this has been growing very strongly.

The second community is much more interesting in my opinion. It’s people who were not really into robotics, but because they are into AI, and robotics looks like a physical manifestation of AI, they kind of want to go into robotics. And so these are people that covers, I would say software developers, but even people who are just interested in robotics. And a good example talking here, it’s interesting. A lot of investors have actually bought the SO-100 arm just to try to understand physically what is this robotic thing, what can it do? And because it seems so accessible—you get the arm and the software and it’s just a python code. And now with a little bit of vibe coding, you can actually even tweak it or control it quite easily. You know, we see people who maybe are not purely technical, but who want to understand what’s happening in robotics, and they use this entry point, which is LeRobot.

Sonya Huang: So you can vibe code a robot?

Thomas Wolf: Yeah. That’s really my goal, yeah. So you can already do that a little bit, but for the new robots, I definitely want this to be, you know, one of the easiest ways to use. I would love my kids to be able to vibe code behavior on the robots.

Pat Grady: What sort of phase of maturity do you think we’re in for the robotics market writ large? You know, like, when will we have a ChatGPT moment in the world of robotics?

Thomas Wolf: Yeah, that’s the thing I’m looking for. Like, sometimes I’ll call it also the iPhone moment, maybe. What will be the first use case, you know, the first moment where everyone or, like, a large fraction of the population will think, “I want a robot,” in terms of consumers. I mean, I think the enterprise market is quite complex. In some places there is already a lot of robots in some kind of industry—car manufacturing is the best example. Then there is this second part, you know, where there is an entry of robots. And here there is a lot of challenge around reliability. Will these robots be reliable enough to be deployed in retail stores and basically be really useful?

But the third part I’m much more interested in is actually entertainment, and some in education, where I think maybe these questions around, you know, “I need a $3,000 robot because I need reliability,” are less relevant. And so you can take a robot that’s really accessible—so Reachy Mini for instance, is priced at $300. That’s something that can definitely be like an impulsive buy. You buy it as a gift and you’re not sure if it’s going to work or not. But for this price, what we want to find is is there not a lot of potential in more, like, entertainment, fun, education, learning AI through a physical interaction stuff, instead of just coding on a chatbot or coding on the screen.

And I think that’s something that has not been explored at all. I think there were a couple of tries. I mean, the MIT Media Lab, Cynthia Breazeal for instance. But in the past they were usually priced a bit high, I think above $1,000. And more importantly, I think the software that was there was very limited. So you would buy a robot that would be fun, but you had maybe five or ten behaviors, and once you’ve tried them all, that’s it, it’s finished.

And here, the goal for Reachy Mini is really to make kind of almost like a smartphone. So it comes, you have a couple of behaviors, but just because you can tweak it and people can build new behaviors and share them and plug all the new VLM speech models, chat models, actually the possibilities are kind of endless. It’s kind of an open door on, like, rebuilding, you know, the app store of iPhone, basically. So that’s what I’m very excited about. And to be honest, this last part is still very much a bet because nothing exists there. Like, there is no, like, real proof. My hints are all this exponential growth of community which make it quite plausible that people really like this.

Sonya Huang: So you see Reachy Mini as, like, you know, the reincarnation of the robot dogs of the ’90s. It was like how people can actually play and experiment and, you know, have, like, these robot companions in the households.

Thomas Wolf: I mean, to be fair, this one is a big bet. But something I was discussing just yesterday actually on robotics at TechBBQ, someone was telling me as an investor, you know, how just he sees so many, many startups just already being built on top of LeRobot. Just, you know, people who are like, they want to build something, they have this idea of a manual test they can automate, or they have an idea of something they could do in the physical world, and then they take LeRobot. They take already, like, the basic building blocks we’ve shipped, which is just a very simple robotic arm, the SO-100 that we designed basically to be the cheapest robotic arm, to be $100. And they’re already, like, trying to start businesses around this, to start, like, to sell something around this. And Reachy Mini is also in a way designed for that. It’s a very wide, simple robot, kind of a white-labeled thing. And if you want to adapt it, and if you think, “Hey, I have a business idea around this, but I need a robot to interact with people—” I don’t know, in a hospital or something like that, you can take this and you can actually start to build your idea. And that’s kind of at the bottom, you know, it’s the Hugging Face ethos, which is you bring all these platforms, all these basic building blocks for people to build really crazy things on top. And robotics is the same goal for us.

Sonya Huang: Really cool. I’m going to talk about data as a bottleneck. Like, I think one of the big differences between language and robotics is you have trillions of tokens out in the public internet to train LLMs. That dynamic doesn’t exist in robotics. And actually I think that’s where Hugging Face’s role in the ecosystem could be much more interesting in terms of decentralized data set curation/creation. Talk about what’s happening on the data set side of LeRobot.

Thomas Wolf: Yeah, it’s super interesting, I think. So I mean, there’s a couple of challenges in robotics. And I mean the general—the main challenge is in the data. There is just not enough data. There are some ways to use video on the internet as training data, but it’s very limited. And in some ways we may be able to use models, but in some others, if you want to automate a test, there is no way around just recording someone or the robots possibly just doing the task.

I think here there is one possibility and one limitation. I mean, the main limitation is you can record a lot of tasks yourself, but usually what you will lack a lot is the diversity. So you will basically be able to train a robot to do something very well in your room when everything looks the same, but once you put it in the next door room where maybe the walls are green instead of red, the robot has a lot of troubles to generalize. So this is the main limitation.

And so our idea with the hub was that everyone could record data sets, and if we managed to incentivize them to share the data, then we could maybe build a very multicolored data set that would—like, multi-location data set that would be extremely diverse, and in addition hopefully would be also very big. So I would say that’s a long-term goal and we hope this can help. But another more direct thing we try to do is to work also directly with the actors of the community. So we release a couple of data sets to try to help them releasing some data sets. We think in robotics, one of the nice aspects is a lot of people in the end want to sell the hardware, and so they can actually afford to—even modern LLMs, they can afford to share a bit of the software as open source if it brings all the fields above, because in the end that’s not really directly what they sell. So that’s kind of what I’m trying to convince a lot of robotics companies to do. And surprisingly, it’s actually something a lot of them seem to be interested in.

Pat Grady: Hmm.

Sonya Huang: Super interesting. You tweeted about world models the other day. I think you and I met the same world model founder. What’s happening in the world model open source, and how does that help or not help what’s going to happen in robotics?

Pat Grady: And can I ask on that too, is there a “why now” in world models at the moment? Because it feels like they’ve started to pop up recently.

Thomas Wolf: So what is interesting is it feels like it’s a couple of teams who have been actually working on that independently for a few months, and they just happened to release this one right now, right? Because when you talk about all of them, they are not really copying each other. I mean, I guess one thing was the advent of really cool, really good image generation, and basically finally understanding how to fix this six-finger thing, and basically just get a more reliable and more coherent world model for image, which naturally was transposed to video. And so we see now some really cool video models as well.

And this one is just one next step. And a lot of the founders I’ve talked to in this field also say that they were helped by the advance of, like, open source video model generation and open source image generation. And basically they take this video generation model, and then they fine tune them and then they train them to be able to react to some inputs. Which is also what we do in robotics. There’s a lot of common points between these two things. And it seems to work quite well.

So you start to have this kind of very interesting, in my opinion, totally new experience where you actually have a film that’s controllable, that’s like both photorealistic and also reacting in a very coherent way to the action you will input, which is either just moving around or just asking it to add something, you know, add a rider, a castle, a car driving. And you see this thing that just reacts very well.

And you have here, I think, a lot of potential application, you know, both obviously in entertainment, actually some form of entertainment that might be just totally new, something we’ve never seen, which is maybe the first time we create a new form of really virtual entertainment, but also a lot of application in business and how you can have interactive thing. And one of these downstream applications is generating more data for robots. I mean there is just two ways to generate data: one is to record it in the real world, which I think is still very interesting. And the other is to simulate it. And surprisingly on the simulation, we have not seen a lot of—you know, I mean, there was some development, but it’s not like there have been some real breakthroughs recently on simulation. So maybe this is the first breakthrough I’ve seen on simulated generated data in quite some time.

Sonya Huang: Yeah, I was very excited to see some of even like what DeepMind’s doing with Genie to train their embodied robots. Super exciting. Humanoids. Do you believe in—do you believe in humanoids as the kind of ultimate form factor?

Thomas Wolf: Yeah, big debates, big debates. What is sure is I’m quite more excited about trying other form factors right now. I mean, the main problem with humanoids—I think there is two main problems. The first one is it’s always quite expensive, just because you need a lot of motors and all the price in the robots is just the actuator; that’s always, like, 70 percent of the price tag. And so when you have 60 actuators, that’s just your bill. And so it’s really hard to drive humanoids below the price of a car. I think the price of a car is still already quite a high requirement. If you buy something that’s the price of a car, you do expect to have a lot of value out of it, right? And so that’s why we’re exploring smaller robots, like just one arm or just moving head and this type of thing.

There is some possibility that we can get cheaper humanoids at some point. And K-Scale was trying to do, Unitree, they’ve been definitely trying to cut the price, and there’s a lot of companies that try to aim, but it’s going to be really hard I think to get this under, like, $10,000, something like that.

The nice thing about the humanoid, of course, is once you’ve solved the humanoid, you’ve solved a lot of tasks at the same time. So if you solve the humanoid, you can do everything a human can do, which is very exciting. And the main question is: do you need to solve the humanoids? So on my side, I’m more like, I would like to see a galaxy of different form factors. I also think some of them are much more cute than a humanoid. I think for social adoption, I think the humanoid is also asking a lot from people. Like, you’re directing this kind of uncanny valley with something that looks a lot like you, moves a lot like you. So I thought this would be a big limit for social adoption. Now to be honest, I’ve seen a lot of Unitree robots, and I don’t know about you, but you kind of ignore them at some point. So I’m also much more confident that people would just say, “Yeah, that’s just—” maybe we are too worried about the uncanny valley in robotics. And maybe at some point, once we start to have seen, like, a couple of robots, people will just accept them very, very easily.

Sonya Huang: Okay, so we’re going to see the LeRobot humanoid soon?

Thomas Wolf: [laughs] I mean, the goal would be if Reachy Mini and our small robots work really well, that at some point we’ll climb back to make it the humanoid form factor, I would think, kind of progressively, as we’ve done, bringing the community along with us.

Sonya Huang: As you imagine the world in 10 years, like, how many robots do you think there are among us? Do you think it’s, like, 80 percent of them are humanoids and then 20 percent are this long tail of this diversity of hardware and use cases, or how do you think the world plays out?

Thomas Wolf: Yeah, and I would love to see the second option, because I think that’s an option where we have much more robots in our life. What I would really not be super excited about is a future where robots are kind of an elite thing because they cost $100,000. And so basically, if you’re rich, you have three robots at home, and if you’re not, you don’t.

I mean, Hugging Face has always been also about the big community. So we care about that. And so for this reason, I’m much more excited to see a lot of form factors that are basically accessible to a lot of people. Some of them are cheaper, some of them are more expensive than just this single humanoid, you know, that costs a lot, and if you can buy it, that’s nice, and if you cannot, too bad for you. So I would say at Hugging Face, that’s the future we try to nudge, to push toward. I think also it’s much more fun, because in a way you’re also restricting yourself, just like LLM, if you just try to make them copy humans, it’s one thing, but if you try to think maybe they can do something that humans cannot do, it’s also much more interesting in a way.

Sonya Huang: Do you think we’re heading towards a world of big foundation models that can kind of do everything, and then be adapted quickly to any new domain with just, like, you know, a few prompts? Or do you think that developers in your community are going to start from, like, a small base model, and then do a lot of their own data collection customization to adapt to their domains?

Thomas Wolf: Hmm. I think we’ll see more and more of both. I mean, I think as the field evolves, you know, we start to see really a long tail. So for instance, if we take the downloads on Hugging Face, we see both, like, very large state of the art models being downloaded, which are usually too large to run on a local laptop. But we see also some of the most downloaded models are actually just the right size that fits to run quickly on the laptop. So we see these two, like, really modalities.

And I think as, you know, the field kind of matures, we’ll start to see this more and more, which is it’s not like you choose one or the other, it’s just depending on what you need, you may choose, you know, one locally or not. And I think GPT-5 with the router is a good example of this. You know, maybe the largest model or the most reasoning, the longest reasoning chain is not the answer to everything and you actually need to smartly select the one you want. So it can be behind a router, but it can also be just locally you will run some models here, they might be extremely useful, and we know better and better how to train models that are actually extremely useful. But when you need something much more complex, when you need reflection for a very long time, then you will turn to much larger models.

Pat Grady: One of the narratives that’s been really popular over the last few years is this narrative of the battle between open and closed, you know, closed models versus open models. Who’s going to win? And just in the last few weeks, OpenAI is now present on Hugging Face. And so I’m curious what to make of that, and sort of what it might imply about the future of open versus closed or maybe how they work together.

Thomas Wolf: I mean, we’re super happy to welcome them back. They were there. I mean, the first model I worked on and the reason we switched from being a game company to an open-source platform was GPT-1, which not a lot of people remember but was very funny because it was trained mostly on novels and romance novels. And so when you would put two characters in the continuation they would always fall in love in some way.

Pat Grady: [laughs]

Thomas Wolf: I still miss a little bit this one. And then Google took this idea and trained it also on Wikipedia, which added a lot of world knowledge and then expanded to GPT and all of that. But at that time, they were very, very pro open source. And I think open source, just like in software, I think both solutions will just coexist, and having a company that’s open source both or that do both—I mean, Google has been an example for quite some time with the Gemma line and Gemini line, and some interesting moments where sometime I heard that one Gemma model was actually so good that it was better than closed-source models, so they had to not open source it.

So the frontier is quite thin at the moment in a way, and the challenging new players—mostly in China, but I think we start to see also some new foundation models team in the U.S., so I think we might see also some challenge in the U.S.—the frontier will stay quite thin, I think, and both things we’ll see with a tiny difference of performance.

The main reason right now, I think, is to be honest, at this exact point of time, I think we’re not exactly in a kind of cost saving time of AI. So which means that for a lot of actors, I think, moving to open source because it saves costs is not the most important thing for them. So usually they move to open source right now because they want data privacy, they want to be able to adapt the model. They maybe have a new idea or new, like, this action model for instance. They have a new idea of something that does not exist and they want to do that. So that’s usually what we see right now. So we see a lot of evolution of new exploration in an open source way.

What I do expect is as we go to a more mature market as well, then the cost and being able to run it maybe on faster hardware or this type of other hardware and then being able to own the model and to own also the full stack of where the models run will become actually more and more important. So I think, just like in software, I think in the long term, open source is kind of a winning solution for many applications, for many use cases, but we’re seeing the turbulence place where …

Pat Grady: Yeah.

Sonya Huang: How do you think Hugging Face’s role in the LLM ecosystem has evolved as these models have pushed at the frontier and there’s closed models? I remember back when it would be like, you could download the small BERT model on Hugging Face and run it locally, right? And that was a lot of the usage. How has your business evolved now that we’re going towards, as you mentioned, models that are too large to run on consumer hardware? And how do you see Hugging Face’s role evolving?

Thomas Wolf: I mean, surprisingly I was doing these stats at the end of last year. This BERT model is still really used a lot. So a surprisingly interesting aspect of open source is also resiliency, which is once you have something that works, that really worked in production, you may not want to be forced to move to the new GPT, right? I mean, that was a little bit of the backslash around GPT-5, you know, just people actually wanted to keep using GPT-4o for many things. Maybe they fell in love with this, or whether they remained friends. And there were some Reddit posts around this, but also maybe they just had their applications work really well and they don’t want to redesign it.

And I think open source, I mean, the long-term interest for us is also to provide this very stable base. Like, you build something, you know it will exist and you know you can keep this as a very stable base. And in general, I think in the community our role has switched progressively from maybe pushing ourselves a lot of things, pushing our library, pushing, you know, our early product, to more like enabling more and more the community in general.

So we work now a lot with many, many actors of the community. We work a lot with Llama.cpp, we work a lot with vLLM, we work a lot with all the big players to try to see, you know, how this whole ecosystem can be very efficient, can work really well. So, like, one model is released, you want to be able to use it directly in vLLM, you want to be able to use it directly in Llama.cpp. So we try to have more and more of this kind of role of meta community builder, where we try to align and to bring maybe all the players, you know, at the same pace and to help them move in the same way. So in a way, we are much more focused on the community, on the Hub than we were maybe a couple of years ago.

Sonya Huang: Really cool. What do you think about what’s happening—you mentioned, you know, China’s had a lot of the open models recently. Like, why do you think that is happening, and what is the state of open model development in the West?

Thomas Wolf: Yeah, this is the most surprising thing that happened, I think, in the last two years, right? The fact that China would become a champion of open source, who would have predicted that in the 2020s, right? And so I’ve been actually visiting them two weeks ago to try to understand a bit better on the ground how it’s happening. And the thing is just it’s a very, very competitive market internally. There are a lot of teams there that are extremely good, and it reminded me in some way of Silicon Valley. People are working extremely hard and they compete with each other, all of these model providers.

One part on which they compete, which is surprising, is being the most open is the open source aspect. So they’re extremely proud of being very open. And some of these companies that stopped being open—one was called Zhipu—they decided to not open source it, and they saw an immediate backslash, I think mostly on hiring. Like, people didn’t want to come work there anymore, and so they went back to open sourcing. So it’s quite strong now, I would say, strongly ingrained. So I would expect this to continue. I would expect also more teams to come because I see a lot of, you know—I mean, we see that as well when you have the presentation of GPT-5, like, a lot of people, you know, actually did our study, some of them at Tsinghua University, right? We know the team are here also, you know, partly with Chinese members. So they have extremely, extremely strong people, and they all really want to train the best model.

What I think is interesting is to see the West kind of coming back to open source very recently, to be honest, just over the summer, right? But this call for open sourcing, OpenAI decided to come back. Now we’re just waiting for Anthropic to maybe open source their first model. So I think it’s time to try to ask them to participate. Yeah, I would say right now, the situation for open source is pretty good, but yeah, it’s like the Jedi in Star Wars. It’s never over. We have to keep pushing this, we have to keep pushing our flag of openness.

Pat Grady: What’s driving the resurgence of open source in the West?

Thomas Wolf: Yeah, I think one thing is when you have, in a way, nothing to lose, open source is always a good solution when you’re on your team. So it can be, for instance, you create a new company and you want to quickly rise to the top, then you open source your model, right? That’s the Mistral recipe. How can you very quickly become a great player? But for the Chinese, it’s also, for instance, in the West, almost nobody will use a Chinese API, so they don’t sell API in the West anyway. So in a way they have nothing to lose from the Western market by open sourcing their market there.

So I think there is this thing at play, and the consequence of that is also that when nobody’s open source, it’s like a market, there is an interest for someone to take the room, right? Say we’re going to be the open source player. So Meta was this open source player when everyone kind of stopped open sourcing. And I feel like there will always be this thing when some people stop open sourcing and then there is actually a gap to being the only—the new top open source actor, then someone will want to fill this void.

Sonya Huang: Thomas, you mentioned that, you know, Western companies won’t use a Chinese model over, you know, Chinese API. What about—are you saying Western companies are actually willing or not willing to use Chinese open models when it’s, you know, the weights and you know, hosted on U.S. servers? Like, is there still hesitance to do that, and is it well founded or not?

Thomas Wolf: I don’t see that a lot, to be honest. I mean, it’s a good question. I try to do regular polls, I try to ask a lot of people regularly, you know, what do you think about that? Because it can be for sure a concern, right? There was when DeepSeek came out, and there was a very nice R1 1776 model from Perplexity, for instance. The thing is, in many business cases I don’t think people really notice anything. So I think there is more of a general appetite for people to have a better way to understand, like, the safety of a model. I would say it’s quite general. People are a little bit worried about having a model that maybe will behave strangely in some cases.

And so I think this is a general thing that a lot of companies have been asking, which is can you guarantee this model will always behave well? Which we know is really hard because even with GPT, sometimes you ask the number of Rs in “strawberry,” and it’s just behaving badly and you’re like, “Why? You’re very smart, you should be able to know that.” So I think this is a general thing that is needed soon, and there’s a couple of teams working on that, for sure.

Pat Grady: Can we talk about open science?

Thomas Wolf: Yeah, we build LLM like a human, but what if, you know, an AI model could see infrared, could see some radiation we cannot? These are things that human cannot do. So it’s already superhuman. And for science it’s actually super interesting. So a lot of the AI models for science are already superhuman in a way because they can actually either see modality or predict things that are just inaccessible to a human. And it’s good—I think it’s a good ground to think outside of, like, the human limitation of what we can do.

Pat Grady: And you’ve been pretty passionate about open science for a while. So can you just say a word about what is open science, what role does Hugging Face have to play, and where does your passion for it come from?

Thomas Wolf: For me, it started a very long time ago. So before Hugging Face I was a lawyer, but before being a lawyer, I was a researcher in physics. And so I was working on this superconductive material. And surprisingly in superconductive material, a lot of the great research had been done by the Soviets back in the Soviet Union. And these people—I mean, the Soviet researchers have a very different way of inventing theory than the Western world has. And so they had some really great ideas and some really interesting things, but I had to find this invention or this theory. I had to find them out, to track them down in the Soviet JETP letters. And some of them were even still in Russian.

And so from this time I got this idea that, damn, accessing knowledge is hard! And if I can make this easier, that’s going to unlock a lot of really cool stuff. If I could just find where, you know, does this equation come from and really be able to read this article, that would be crazy. And so when I joined computer science, I discovered arxiv, I discovered open source, and I was like, “This is really cool. Everything’s just free. Basically everyone just shares things written in English, everyone can read it. It’s even free, you don’t even have to buy the publication.”

And I was very excited about that until I started to try to reproduce one DeepMind paper, and I discovered that there was a limit because people publish what they want to publish, but they don’t really give you all the tricks of the trade, right? And so when you try to reproduce that, you discover that it just doesn’t work. And so open science for me was this extension, which is it’s nice to give open models to people so they can build things on top, but it’s even better to explain how to train a model, you know? It’s a thing where it’s nice to give a fish to someone to feed them, it’s even better to teach them to fish. And that’s basically what we want to do.

We think in the very long term, AI is going to be such a fundamental technology that basically it should be just like physics, should be something everyone could learn by reading a book. Like, if you want to learn today about general relativity, you can read a book and you can know about it, right? You don’t have to pay to get access. I mean, you buy the book or you find it, but that’s basically free access.

I think AI, all the recipes to train an intelligent object or artifact should also be something that everyone should know. So I mean, that’s a very long-term thing, but the very short-term thing is if we teach people how to train great models, then they bring great models on the hub, and then we have much great content to offer. So it’s kind of also just content providing. If you provide a great model, it’s nice.

And so one example that we do for that is we write very long blog posts, that some of them even become books. We just published a book this summer on how to train on a thousand GPU, and how to balance the load and how to do all of this parallelism thing. Another very long blog post we wrote was around how to make a very good quality data set. And so we made a data set to free train models. It’s called fineweb, and it’s used in a lot of the recent models, the QWEN models, they say they use FineWeb, for instance. And then we also wrote how we built this data set, how we filter it, what is important to understand when you want to build great data to train models. So all of this, I think, just goes together, and for us is a way to basically bring just better open-source AI models in Hugging Face.

Sonya Huang: I want to go back to your physics and superconducting comments. Like, it feels like a lot of the AGI labs believe that AI actually disrupting science is not that far out. There’s been some exciting discoveries—well, I think there’s been exciting evidence so far in math, and then maybe extending into physics, material science. Like, do you think we’re going to see an inflection point in scientific discovery from these models? And what do you think open source’s role will be in driving that?

Thomas Wolf: As always, it’s nice there is some hype here, because then it drives people. But I think sometimes we overestimate what’s happening. I mean, math is a good example, right? There was this idea that AI is doing a new proof for some math theorem and that’s like inventing new science. As a scientist myself, I think that’s really the wrong way to view science. The reason is I was a bad scientist, so I can tell about that. So I was a very good student, so when you give me a problem, I’m always pretty sure I can find the proof, I can find the thing. But I know this thing has a solution, so I just have to fill the gap and kind of grab a couple of things I know and then combine them together.

And when I became a researcher, I discovered that I was a pretty bad researcher, because basically what I was not able to do was I was not able to ask the right question. So if somebody asked me a question, say, “Can you demonstrate this theorem?” I could do it. But if someone says, “Okay, what is interesting to explore now in math?” I had no idea, basically.

And so in science, the main thing you need to do, if you want to do—I am talking about big, big breakthrough, right?—is you need to ask the right question. You need to find a way to ask a question nobody has asked before, and a question that will open a whole new field of research. And that’s basically a Nobel field, a Nobel Prize. A Nobel Prize is typically someone who just opened a new field of research because this person just asked the right question. It’s maybe, you know, maybe the speed of light should be the constant, and let’s explore what does it mean? And it means actually we can create general relativity and then we can invent black holes out of it.

And I think LLMs right now are still extremely bad at this thing, at this kind of tasteful way to ask the right question. Which doesn’t mean we cannot do really cool stuff with them, but the way I see them nowadays is really more as very useful helpers. So once you have a human researcher who says this is something interesting to study, then you can use them to actually multiply by ten, a hundred or a thousand the predictions you can do. You can use them to quickly do a full survey of what has been done in the past on this molecule, this protein. You can use them to say, “Okay, what would be the most logical way to test this hypothesis?” But I still see this as kind of an accelerator and an assistant of scientific research. Then what I would love to see, which is an AI that would say, “Hey, I have an idea on how to go faster than light.” But for this, you cannot just write the answer on how to go faster than light. You have to ask the right question. What should we change to today’s theory? Or what should we do today? What should we reconsider to invent something that’s as groundbreaking as that?

Pat Grady: To your point on asking the right questions, what do you think are the interesting questions in the world of AI right now, or maybe the questions that people are not asking, that they should be asking.

Thomas Wolf: I mean, this is one question, I think, and it’s related to something we talk a lot, which is this sycophancy, this tendency of AI models to always agree with you. I think a good researcher is actually a good example of a person who disagrees with a lot of people. My former professor was a Nobel Prize—was very not friendly in how he would—but I think that’s part of it. You have to be extremely opinionated. So finding a way, you know, to push this model to have maybe in a way a stronger opinion or maybe a taste in their opinion or, like, I think for science will be a key. And it may—of course, this will be based on deep learning and LLM, but it may involve other ways to train them, other ways to think about them. I think that’s one of the big questions, I think. There is a couple of people exploring that, but not a lot of people are exploring.

Sonya Huang: Okay, when you see the world in 10 years, like, what is Hugging Face’s role in it? How much of your community do you think is building with LLMs, with robotics? Will there be more stuff that people will be doing? I know it’s hard to think in 10-year time spans, but what do you think the world looks like in 10 years?

Thomas Wolf: Yeah, 10 years is very, very different. I mean, what I would love to see is in 10 years a world where basically everyone feels like they can build with AI and not they’re just consuming AI, but they feel like they can be an actor of this thing. A little bit like the difference between we used to have a lot of media that were generated and created for us, and then we moved to the current era where everyone is actually able to create media, and we saw that this created a whole new generation of people and YouTubers and influencers and people acting, making extremely interesting content.

And I would love AI being the same, which is a very big community like the software developer community where everyone can create things with AI and they feel like it’s just another tool in their box. They can code stuff, but they can also train a model and they can mess with—maybe adapt the model. So the nice thing about that is I’m a big believer in basically the creativity and natural invention of just the community. I think it’s something that’s very beautiful to witness. So in 10 years, I hope that people are not just consuming, you know, AI content and not doing anything, but they’re actually exerting their creativity to build really nice things with a lot of AI tools around them.

To be honest, that’s something that I think that’s kind of what we are building right now, so I’m quite optimistic. The thing is going to change a lot of things for society in general because a lot of this job will just be different.

Sonya Huang: It’s a beautiful vision. Thomas, thank you so much for joining us today. We really enjoyed this chat.

Thomas Wolf: Thanks. It was a pleasure.

Mentioned in this episode:

SO-100: A 3D printed robotic arm that Hugging Face has open sources so people can download and fabricate for around ~100
LeRobot: Hugging Face hub for models, datasets, and tools for real-world robotics in PyTorch
Reachy Mini: Cute little robot made by Pollen Robotics (recently acquired by Hugging Face) that runs on LeRobot with prices starting at $299
TechBBQ: Startup event in Copenhagen that Thomas spoke at this year
Genie 3: World model from Google DeepMind
GPT-1: OpenAI’s first GPT was open sourced on Hugging Face
Llama.ccp and vLLM: Open source libraries for performing inference and serving LLM models
R1 1776: Open source model from Perplexity based on DeepSeek R1
JETP: Soviet era Journal of Experimental and Theoretical Physics that Thomas tried to access as a physics researcher
The Ultra-Scale Playbook: Book published by Hugging Face on training LLMs on large GPU clusters
FineWeb: Hugging Face guide to building large, high-quality data sets

Building the “App Store” for Robots: Hugging Face’s Thomas Wolf on Physical AI

Training Data: Ep62

Listen Now

Stream On

Summary

Transcript

Chapters

Contents

A spidey sense for robotics

What is LeRobot?

The iPhone moment for robotics

The data bottleneck

The why now for world models

Humanoids and other form factors

Open vs closed source

Open source in China

Passion for open science

Asking the right questions

10 years from now

Mentioned in this episode