Randomize, Identify, or Dream – Perception in Robotics

jpatel32 pts0 comments

Randomize, Identify, or Dream - by Jaimin

Atoms to Algorithms

SubscribeSign in

Randomize, Identify, or Dream<br>Wednesday, June 3, 2026 · Learning

Jaimin<br>Jun 03, 2026

Share

A robot policy trained inside a simulator on a Unitree G1 humanoid does not know that the bushing on the real robot’s left ankle has lost two-tenths of a millimeter to wear after twelve hundred hours of operation. The classical answer is to randomize: train under a simulator that wobbles its own physics enough that the worn-bushing condition is one sample inside the training distribution. The policy never learns to depend on any particular value of friction or mass, so when reality hands it a slightly off value, it treats the difference as another draw from the training set. This is the trick that has carried robot learning since 2017, and in 2026 it is in the middle of a three-layer upgrade. Last issue showed that policies have a data-scaling law to lean on. The question today is where the rest of the gap gets closed.

Three mechanisms are converging in the production stack. Randomize the broad envelope. Identify the actual parameters from a short window of real-robot data. And, for the part of the gap physics engines get visibly wrong, drop the physics engine and let a neural simulator hallucinate the next frame from 44,000 hours of human video.<br>How it actually works

Domain randomization is the lazy theorem. You list the parts of your simulator you do not trust, friction at every joint, the exact mass of every link, motor torque constants, sensor noise floors, lighting, camera angles, and sample them from a wide enough distribution that reality lands inside the spread. The policy learns the behavior that survives this variation. On a real robot, the real parameter vector is just one more draw from the training distribution. The argument is not that the policy figures out the parameters, only that it never relied on them. NVIDIA’s Isaac Lab 2.3, released in November 2025, packages this as a configuration object you attach to any training environment. You can spin up thousands of parallel simulated robots on a single GPU, each one with a slightly different friction profile, payload mass, motor gain, and camera setup.

Randomization is a blunt instrument. Every dimension of variation the policy has to absorb is paid for in peak performance, the way an athlete who trains in mountains, oceans, and deserts is worse on any one terrain than a local specialist. So in 2026 the second layer is system identification, the targeted fix. Run the candidate policy on the real robot for a few minutes, collect (state, action, next-state) samples, and tune the simulator’s parameters so its predictions match reality. SPI-Active, a 2025 paper from Carnegie Mellon and NVIDIA, treats this as an active exploration problem. The robot deliberately probes the corners of state space where simulator predictions diverge most from reality. The reported gain is 42 to 63 percent lower locomotion error than passive identification, validated on the Unitree Go2 and G1.

The newer cross-cutting trick is to treat the simulator itself as a randomization knob. PolySim, an October 2025 paper, trains a humanoid policy across multiple physics engines in parallel: MuJoCo, Isaac Gym, Genesis, and Brax run alongside each other and disagree about contact, friction, and articulation in different ways. The policy that survives the disagreement gap deploys zero-shot on the real Unitree G1, no real-robot fine-tuning required. Any one simulator is wrong in known ways, but the agreement set across four of them is closer to physics than any one on its own.<br>The visual half of the gap closes through a different route. The 2024 answer was photorealistic rendering plus random texture, lighting, and pose. The 2026 answer is Gaussian Splatting, a way of reconstructing a real scene out of millions of fuzzy 3D blobs that render in real time. SplatSim swaps the meshes inside a physics simulator for splat reconstructions of real scenes and reports 86 percent zero-shot transfer. RoboSplat reports 88 percent one-shot transfer from a single real demonstration, beating prior baselines that required hundreds.<br>The largest question of all is whether you need a physics engine at all. NVIDIA’s DreamDojo, released in February 2026, is a world model trained on roughly 45,000 hours of egocentric human video. Give it an action signal and it decodes the next RGB frame plus sensor state directly in pixel space. There is no MuJoCo running underneath it, no rigid-body solver, no graphics engine. The reported correlation between DreamDojo’s simulated success rate and real-world success rate is 0.995 on a fruit-packing benchmark. The distilled real-time version runs at almost eleven frames per second. NVIDIA’s framing is that the parameterized domain-randomization stack is “Simulation 1.0” and DreamDojo is “Simulation 2.0,” where the simulator’s job is not to enforce Newton’s laws but to imagine the next...

real simulator robot policy from physics

Related Articles