The sample efficiency black hole - by Dwarkesh Patel
Dwarkesh Podcast
SubscribeSign in
Blog<br>The sample efficiency black hole<br>"We see these AIs as a galaxy glittering with capabilities, but at their center, invisible to the naked eye, holding all the constellations together, is an unimaginably massive black hole of data."
Dwarkesh Patel<br>Jun 08, 2026
54
Share
One definition of intelligence is sample efficiency - that is to say, how much data do you need to see in a given domain in order to operate fluently and competently. It’s not clear that we’ve actually made much progress on training sample efficiency over the last few years - it seems like more so we’ve dramatically widened and improved the data distribution.<br>The main way that AIs have been getting better is from adding more and better data, and scaling the compute to develop that data in the first place. Obviously RL is the main way that has happened. You can think of RL as a kind of synthetic data generation - you dump a lot of compute against a verifier in order to find the “good” data. Then you train your model to predict these correct rollouts, much in the same way that you might train it to predict the next word in internet text.<br>For this process to work, the model must have at least prior some probability to anticipate the correct solution, which is why you also need mind-stretching amounts of human expert trajectories in every single field and skill you want the model to be competent at.<br>It’s hard to overstate how task specific and bespoke this human expert data is. If you want to get some intuition, go read some job descriptions at Mercor or Surge’s websites. There are listings for a word specialists who will convert legacy documents into polished Word files, and legal experts who will write realistic M&A diligences or securities filings, and management consultants who will write up template market research, and dozens more other particular categories.<br>And it is not only that the data have to be so domain specific, but there has to be so much of it! Each skill corresponds to at least hundreds of human experts who are generating example completions, writing rubrics, and explaining their chain of thought. There’s a reason that the data industry producing these expert labels (and the RL environments in which their meticulously catalogued skills can congeal) is earning billions a year in revenue, soon deca-billions.<br>Imagine if it took a couple decades worth of courses with hundreds of concurrent professors and millions of practice tasks for you to learn how to polish a word file. Even the task count difference understates the gap - the models have to grind their far more numerous tasks each far harder. Whereas a human student might practice a textbook problem once or twice, GRPO has the model generate hundreds to thousands of rollouts per task. We are building some Frankenstein’s monster, with a billion grafts of carefully constructed examples sewn together.<br>Epoch recently reported that open models only lag state-of-the-art closed models by 4 months. I think the reason it is relatively easy for open source and previous laggards to catch up to within months of the frontier is that data is the real driver of progress. And data can be easily distilled from public APIs, whereas hyper-parameters and training tricks and architectural micro-optimizations cannot - if the latter were driving most of progress, then catching up would be harder than we are observing it to be.<br>It is easy to forget how much data these models are trained on, and how much more it is than what we humans see in our lifetimes. We see these AIs as a galaxy glittering with capabilities, but at their center, invisible to the naked eye, holding all the constellations together, is an unimaginably massive black hole of data.<br>Intermission: Comparing human and AI sample efficiency
If a person hears and sees on average ~2,000 words an hour, then from birth to adulthood, they’ll see ~200 millions tokens. By contrast, frontier models are trained on somewhere between 10s to 100s of trillions of tokens. That is close to a million fold difference.
A person can learn to teleoperate any random humanoid or robot arm within hours. The reason robotics isn’t already a deca-trillion dollar industry, with a endless army of Unitree G1s doing all kinds of useful work in world, is that our AIs learn so much less efficiently than humans, and even the millions of hours of demonstrations we’ve collected is not enough to allow them to perform complex, open ended tasks.
A teenager can learn to drive a car with about 20 hours of practice. Even if you include their ~16 years of accumulated physical intuition as relevant training data, that is at least 3-4 orders of magnitude less than the amount of data Waymo and Tesla have needed to train their self-driving car models.
I wanna deal with some common objections to this kind of comparison:<br>Many billions of years of evolution is our pre-training, so it’s...