The Camera Inside the Fingertip - by Jaimin
Atoms to Algorithms
SubscribeSign in
The Camera Inside the Fingertip<br>Tuesday, May 26, 2026 · Perception
Jaimin<br>May 26, 2026
Share
A radar voxel can tell a robot a forklift is reversing at three meters per second. It cannot tell the robot that the carton it just picked up is starting to slip between two compliant pads, because the carton is inside the gripper, not out in front of the camera. Yesterday’s issue closed with a promise about an elastomer skin, a tiny camera, and three colored LEDs. Today walks the trick that turns a deforming piece of rubber into a sub-millimeter map of whatever the robot is touching, and the curious fact that the highest-resolution touch sensor in robotics today is not really a touch sensor at all. It is a camera looking sideways at a piece of rubber.<br>(just a reminder - A voxel -"volumetric pixel" is the 3D equivalent of a 2D pixel. While a pixel is a single point of color in a flat image, a voxel is a tiny 3D cube or block that holds information for a specific point in three-dimensional space.)<br>Friday gave the robot three optical depth modalities, and Monday added radar’s fourth dimension. All four measure the world from a distance. Today closes the distance. Vision-based tactile sensing replaces an array of pressure transducers with a high-resolution camera and a transparent skin, then uses the camera image to work out what the skin is touching. The technique sits inside more than a hundred research papers and a half-dozen commercial products as of mid-2026, and it is the modality every serious humanoid hand program now budgets for.<br>How it actually works
The bottom of a GelSight fingertip is a thin slab of clear silicone, a few millimeters thick, with a layer of reflective paint on the outer surface. The paint makes the gel look opaque when you look at it from inside the finger, even though the gel itself is transparent. When the finger presses into an object, the painted outer surface deforms to match the shape pressed against it. Inside the rigid housing of the finger, a small color camera looks up at the inside of the gel, and a ring of LEDs in at least three different colors lights the gel from three different positions around the camera.
This is where a forty-five-year-old trick from computer vision called photometric stereo enters. The idea is that if you photograph a surface lit by three known lights from three different angles, the shading at each pixel tells you the orientation of the surface at that pixel. The classical version of the trick took three separate photos. The GelSight version takes one photo with three colored lights, because the camera’s red, green, and blue channels do the separation for you. Press a coin into the gel and the lookup tells the camera that the rim is steeper than the face, the date numerals dent the gel by tens of microns, and even the rotation of Lincoln’s head shows up as a tiny deflection. The earliest GelSight, out of Ted Adelson’s lab at MIT in 2009, resolved features about two microns wide. The commercial GelSight Mini, in production today, resolves features in the twenty-five to seventy-five micron range, which is more than an order of magnitude better than what the human fingertip can discriminate.
Three things follow from this design. First, the sensor is camera-priced, not transducer-priced. A 1080p camera and a 3D-printed plastic housing cost roughly fifteen dollars in parts. This is why Meta and GelSight together open-sourced DIGIT in 2020, and why the academic community now has more than thirty hardware variants in circulation. Second, the spatial resolution dominates everything else. A traditional capacitive or piezoresistive tactile array good enough for industrial gripper feedback runs at one taxel per several millimeters and tops out around a few hundred taxels per finger. A GelSight Mini produces roughly thirty thousand effective tactile pixels per fingertip at a similar cost. Third, the output is an image, which means everything the field has learned about training neural networks on images applies directly. The same foundation-model leverage that arrived in optical depth (Friday) and in 4D imaging radar (Monday) is now arriving in tactile.<br>What this buys for manipulation is the part that matters. Press a GelSight finger against a piece of cardboard and you can read the texture and the slight ridge where the box flap meets the body. Press it against an egg and you can watch the contact patch grow as the grip closes, then see the shear pattern shift the moment the egg starts to slip. The slip shows up in the gel image tens of milliseconds before the egg actually moves, which is enough time for a closed-loop controller to tighten the grip before the egg falls.<br>New this week
A team out of Beijing’s GeWu-Lab released AnyTouch 2 in February, a tactile representation-learning framework that works across multiple sensor types and explicitly models physical force dynamics...