Velocity in Every Voxel – Perception in Robotics

Velocity in Every Voxel - by Jaimin - Atoms to Algorithms

Atoms to Algorithms

SubscribeSign in

Velocity in Every Voxel<br>Monday, May 25, 2026 · Perception

Jaimin<br>May 25, 2026

A stereo camera pair can tell you a mug is sitting on the edge of a table. It cannot tell you the mug is sliding off, or how fast, until it has compared two frames and worked out the difference. A 4D imaging radar reports the velocity in the same shot that reports the position. Every point it returns carries range, azimuth, elevation, and a Doppler-derived radial velocity from a single burst of radio waves, no frame-to-frame stitching required. Friday closed with a promise about the radar lane. Today walks the trick that makes the radar see in four dimensions instead of three, and the company that has pushed it past the rest of the industry.

Last post covered three depth modalities (stereo, structured light, time of flight) all measure position. None of them measures velocity directly. To know whether the world is moving, an optical system has to detect, recognize, and track objects across frames, then subtract the old positions from the new. That works, mostly, until the lighting fails or the object is partially occluded for a frame or two, at which point the tracker forgets and starts over. Radar lives at a different wavelength, with a different physics, and skips the entire tracking problem for the question of “how fast.” This is why the modality matters for any robot that has to share space with moving people, vehicles, or other robots.<br>How it actually works

Imaging radar works in two steps. The first step is a hardware trick called MIMO , short for Multiple-Input-Multiple-Output. A radar with N transmit antennas and M receive antennas, configured carefully so the transmitted signals do not interfere with each other, behaves like a radar with N times M virtual receive antennas. The math is the same triangulation logic that lets two ears localize a sound: the more spatially separated receivers you have, the sharper the angular resolution. Texas Instruments and NXP sell automotive-grade radar chips that arrange 4 transmitters and 4 receivers, for 16 virtual channels per chip, cascadable up to a few hundred. Arbe Robotics, an Israeli company listed on Nasdaq, builds a radar with 48 transmitters and 48 receivers, for 2,304 virtual channels in a single car-grade unit. The angular sharpness this buys is roughly one degree of azimuth and a degree and a half of elevation, which is roughly a hundred times finer than the radar in a 2018 driver-assistance system.

The second step is a stack of three Fast Fourier Transforms run on the raw samples coming off the antennas. The first FFT pulls range out of the beat frequency of a single chirp of frequency-modulated signal. The second FFT compares many chirps in a burst and pulls Doppler velocity out of the chirp-to-chirp phase progression, because anything moving toward or away from the radar shifts the phase of the returning echo by an amount proportional to its speed. The third FFT runs across the virtual antenna array and pulls angle (both azimuth and elevation) out of the small phase differences between antennas. The output is a four-dimensional cube. Every voxel in the cube has range, azimuth, elevation, and a sign and magnitude of velocity. The forklift reversing toward the robot is one voxel, with a negative range-rate, in the cube the radar produces every thirtieth of a second.<br>There is a third change underway in this lane. Traditional automotive radars run the FFT stack on a chip inside the radar unit, threshold the result, and send a sparse list of detection points to the central computer. That list throws away about ninety-nine percent of the underlying signal. NVIDIA’s centralized-radar architecture, demonstrated with the Chinese radar maker ChengTech at GTC 2026, instead streams the raw samples (about 540 megabytes per second across a five-radar car belt, compared to about five megabytes per second of point cloud) into the main computer’s memory, and runs the FFTs on a dedicated accelerator inside the NVIDIA chip. The dense range-Doppler-angle cube becomes available to the perception model in the same way a raw camera image is available, instead of being pre-filtered into a Canny-edge-style sparse output. The same pattern that Friday’s foundation stereo network applied to optical depth is now being applied to radar.

New this week

Arbe announced at CES 2026 that its 2,304-channel Phoenix radar plugs into the NVIDIA accelerated-compute pipeline as a production reference for AI-based perception. NVIDIA’s own developer blog walked through the new centralized-radar approach in detail, with a working demonstration on a DRIVE AGX Thor board at the company’s March 2026 conference. A pair of recent academic papers extend the same idea into mobile robotics specifically: one paper called DRO uses the Doppler shift in a single commodity radar chip to do robot self-motion...

Velocity in Every Voxel – Perception in Robotics

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits