@adlrocha - Form Before Data: The Real Bottleneck for Physical AI
@adlrocha Beyond The Code
SubscribeSign in
@adlrocha - Form Before Data: The Real Bottleneck for Physical AI<br>The Tesla flywheel and why we won’t get the Humanoid butler anytime soon<br>adlrocha<br>Jun 21, 2026
Share
A reader messaged me last week with a question about a topic that has been in my backlog for a few months now, AI and the physical world. The request was the following “Can you elaborate on the rate of adoption of AI for the physical world? We see [it] operating almost entirely in the digital realm. The Tesla FSD vehicles are examples of AI moving in the physical world. We are also beginning to see other machines such as humanoid robots move through space by interpreting the visual field. But these examples are still very uncommon.”<br>He’s right, and while “still very uncommon”, the field is making progress fast. We have AI that writes code, drafts contracts, and passes the bar, and we have a handful of cars and factory robots, and almost nothing in between. But my feeling is that the gap isn’t intelligence, I think the models and foundational technology is there. What we are missing is the right “body” and “senses” for the model to make sense of the world, and the data needed to teach it how to navigate it.<br>That’s the thesis I want to make the case for in this post. Tesla cracked self-driving first not because its models were the smartest (until quite recently they were using traditional visual pattern recognition models instead of using deep-learning end-to-end), but because the car was already the right shape to act in the world for their specific task. It rolls, it steers, it has somewhere to put cameras. The form of the robot and the actions it had to perform in the physical environment were already well-defined . Everything physical AI does next is a search for that same fit: the right form for each task, and the intelligence to drive it through a messy, imprecise, badly-lit world that no simulation fully captures.
Why the car came first
A car is a strange thing to call an autonomous robot, but that’s essentially what a self-driving car is: a machine that senses its environment and acts in it. And it turns out to be an unusually “easy” (big quotes) robot. It moves in two dimensions. It has four contact points with the world and they never change. It can’t fall over, it can’t drop anything, and the rules of the road are written down. Compare that to a hand picking up an egg, where success depends on grip force you have to feel rather than see, and you start to understand why driving fell first. The car was already the right shape for the job, and the operations it could perform and its core goals were well-defined. Similarly to how LLMs cracked coding first because there was an objective feedback loop to optimise, car was the obvious one (in retrospective) for AI in the physical world.<br>Everyone (I hope) that owns a car knows how to drive it. Tesla managed to ship an attractive EV that people would buy, drive, and collectively pull the real-world data required to eventually teach an artificial brain how to autonomously drive one of these robots with wheels. Having access to all of this raw data of the physical world in virtually all possible kinds of scenarios, environments and locations, is what has enabled Tesla to finally crack SFD.<br>Tesla’s fleet has now passed 10 billion miles driven on FSD, adding roughly a million miles a day. Since FSD v12 the system has been a single end-to-end neural network, vision only, with the old hand-written rules torn out. It learned to drive by watching the fleet drive.<br>Notice the order of operations. Tesla didn’t sell cars and then bolted on autonomy as a side project. The car was the data-collection programme. Every vehicle on the road has eight cameras recording how real people handle real roads in bad weather, and that stream is what trains the next model. They built the perfect sandbox environment and data flywheel to train their self-driving models. Waymo, with better sensors (because Tesla only uses visual sensors) and a smaller fleet, has spent years unable to (so far) out-engineer Tesla’s simple advantage.<br>One of the reasons why Tesla could build this flywheel is because the “body”, “environment”, and “rules” for these robots were pretty well-defined. You cannot collect ten billion miles of driving data without ten million things shaped like cars already driving around. Form is the precondition for data, not the other way round. That is the move every physical-AI company is now trying to repeat, and it is much harder when the task is folding laundry instead of staying in a lane.<br>If we treat Tesla cars as the first instance of autonomous physical robots, I think there’s a lot of learnings that we can extract and immediately apply to the field of robotics.
The two things a body still can’t do
If form is the precondition, the obvious question is why we don’t already have the right...