When the Error Lives in the Body Frame – Perception in Robotics

When the Error Lives in the Body Frame - by Jaimin

Atoms to Algorithms

SubscribeSign in

When the Error Lives in the Body Frame Friday, May 29, 2026 · Perception, the closing argument

Jaimin May 29, 2026

A Cassie biped lifts its right foot, plants it 40 cm forward, and the contact point at the heel never moves once. Inside the estimator, a Right-Invariant Extended Kalman Filter is treating that planted foot as a measurement. The body’s joint encoders report “the heel sits at this position relative to the pelvis,” and the world says “the heel has not moved.” Those two together pin the pelvis pose in space. The trick at the bottom is that the filter’s state, orientation, velocity, position, plus a stack of contact-foot positions, lives on a structured object called a Lie group, and the error between the true state and the estimate lives on the same object. Linearizing along that structure, rather than along an arbitrary flat tangent line, is the difference between a filter that holds consistency across a one-legged hop and a filter that does not.

This week we walked through the perception layer in order. Monday: 4D imaging radar and the Doppler voxel. Tuesday: a camera inside a fingertip turning touch into a sub-millimeter image. Wednesday: factor-graph SLAM and the loop closure that snaps drift back to centimeters. Thursday: the IMU preintegration trick that lets a 1000 Hz inertial stream live inside a 10 Hz pose graph. Today is the closing argument. The sensors are good. The graph is fast. What ties them together when the platform is a robot with feet? How it actually works

A walking robot’s pose is three things at once: which way it is facing, how fast it is moving, and where it is in the world. If you stack those into a single 5x5 matrix, the matrix is an element of a mathematical object called SE_2(3), the “double direct” extension of the rotation-plus-translation group SE(3). When a foot is in contact with the ground, the contact-point position joins the same matrix as an extra column, and the object becomes SE_K(3), where K is the number of feet currently planted.

Why bother with the bigger structure? Because of a property the original invariant-filter theorists, Axel Barrau and Silvere Bonnabel, proved in 2017. When you write the robot’s inertial dynamics in body coordinates (in the frame that moves with the robot, rather than in the world), the dynamics have the same shape no matter where on the group the robot is. That sameness is what “invariant” means. It is also rare. Most nonlinear systems do not have it. The payoff is that the error between the estimated pose and the true pose itself becomes a group element. When you write the equation for how the error evolves over time, the Jacobian (the matrix the filter uses to do its linearization) does not depend on the current estimate. The filter is linearizing around the geometry of the group instead of around a guess. Barrau and Bonnabel proved this makes the linearized error dynamics globally convergent. A vanilla quaternion-based EKF cannot make that guarantee, and the failure modes show up exactly where you would expect: high-acceleration maneuvers, large initialization errors, and the moment a foot lifts and another lands. The contact-aided extension came from Ross Hartley, Maani Ghaffari, Ryan Eustice, and Jessy Grizzle at the University of Michigan in 2018 and 2020. They added the contact-foot positions to the state matrix and showed that the joint encoders give a clean measurement update inside the same invariant framework. When the foot lifts off the ground, its column gets marginalized out. When the other foot lands, a new column is initialized. The Cassie biped was the test platform; the math has since been adopted across quadrupeds and humanoids, and the paper is the citation a contemporary legged-robot estimator gets built around. New this week

The 2026 publications layer learning on top of the invariant kernel without breaking the symmetry argument underneath. An April arXiv release from the Liege and Mines Paris groups introduces an iterated form of the invariant filter (arXiv 2604.15449) and shows it outperforms the original on quadruped odometry in both accuracy and consistency. A March paper from the University of Delaware (arXiv 2603.18308) replaces the Gaussian noise assumption with a set-coverage statement, which fixes a common failure mode when learned components are trained on thin data. A January CES 2026 announcement from NVIDIA (Isaac GR00T N1.6 sim-to-real workflow) pairs whole-body reinforcement learning trained in Isaac Lab with a CUDA-accelerated visual mapping stack on Jetson Thor; the state estimator underneath is implicit, but the integration target is the production reference every legged estimator now writes against. (As I am learning more here everyday, I am becoming fan of Jetson Thor, at some point I have to do one full deep dive post on it.) What to notice

The visualization compares two...

When the Error Lives in the Body Frame – Perception in Robotics

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan