A Push on Tactile Data, and a Warning That the Benchmark Was Leaking

SubscribeSign in

A Push on Tactile Data, and a Warning That the Benchmark Was Leaking Weekly Physical AI Roundup. Jay Chia Jul 02, 2026

Touch is the sense a robot needs most for contact-rich work and the one it has the least data for. RCT just showed that the tactile benchmark most groups report on has been leaking near-duplicate contacts between train and test, so a good chunk of the field’s reported generalization was memorization. It arrived in a week when several teams were separately working on where tactile data comes from at all. Tactile’s data problem, all at once

RCT is a robot-collected touch-vision-language dataset, 29,279 contact frames pressed onto 122 industrial materials with DIGIT sensors, and its contribution is as much a warning as a dataset. On the widely used TVL/HCT split, a raw-pixel nearest-neighbor search recovers the correct test contact 98.3% of the time, which means the test sequences already sit in training. Once RCT enforces held-out-material splits, tactile-to-text recall on unseen materials drops to around 25%. Novel-material generalization is mostly unsolved, and the numbers that suggested otherwise were partly measuring leakage.

The rest of the cluster is the field filling that data gap from different directions. RoboTacDex collects 6,000 dual-arm trajectories on a Unitree G1 with synchronized RGB, depth, and tactile across 19 tasks, with the dataset promised but not yet posted. TacGen skips collection and synthesizes tactile latents from RGB, treating touch as a physical evidence channel you can partly reconstruct from vision. UniTac goes after sensor fragmentation, training one model for tactile understanding and generation that transfers across sensor types, since data gathered on one sensor rarely carries to another. A second, smaller thread puts touch into VLAs. UniTacVLA predicts future contact instead of only reading current contact and feeds that into the controller. TAP-VLA takes the cheaper route, overlaying tactile shear fields as vectors on the camera image so a vision-language-action model can use touch with no architecture change, reporting 78% on contact-rich tasks against under 50% for vision only. RCT’s result is the one to sit with. As tactile data scales, the way the field has been scoring it doesn’t hold up, and it’s worth watching whether the held-out-material split becomes standard. Thanks for reading Topic Queue! Subscribe for free to receive new posts and support my work.

Research

The largest open teleop dataset is now loadable in LeRobot

ABC-130k is roughly 130,000 bimanual episodes and about 3,500 hours collected on the low-cost two-arm YAM rig, released Apache-2.0 by XDOF alongside open hardware, training code, and a sim recipe. Its card claims the largest open robot teleoperation dataset to date; the practical news is the re-publish into LeRobot v3 format this week (3.43 TB, AV1 video), so you can pull it into the standard stack instead of raw MCAP.

R&B-EnCoRe: a VLA that filters its own reasoning

Stanford’s R&B-EnCoRe has a reasoning VLA generate candidate chains of thought, keeps only the ones that measurably improve its action prediction, and bootstraps from there, with no external rewards, verifiers, or human labels. It reports gains across manipulation, legged navigation, and driving at 1B to 30B parameters. Single-method numbers, but the code and weights are out. The paper is from February; the open release landed this week. Z-1: an RL recipe for flow-based VLAs

Z-1 post-trains a π0.5-style flow VLA with task-wise GRPO, shared-prefix rollouts, and completion-aware reward calibration, reaching 80.6% average across 24 RoboCasa tasks, 13.2 points over supervised fine-tuning. A concrete, reproducible recipe on a public benchmark rather than a new architecture. A first sim-to-real transfer for world-action models

This paper adapts Cosmos Policy, a video-diffusion model repurposed for control, using about 800 synthetic demos per task plus heavy domain randomization, then deploys zero-shot on a Franka with no real demonstrations. It claims the first successful sim-to-real transfer of a world-action model. The 35% average success is modest, so read it as an existence proof. An open-depth benchmark inside LeRobot

The LeRobot org stood up a depth-estimation benchmark this week, with a results repo, a submissions repo, and an outdoor eval set. Depth quality is a quiet bottleneck for manipulation and navigation, and this is a community leaderboard anyone can submit to. Quick hits

Human-as-Humanoid — pairs egocentric and exocentric human videos, converts them to humanoid actions by inverse kinematics, and trains upper-body policies that transfer to the real robot with no robot-specific demos, at 4.8 to 7.2x teleoperation throughput.

VLX-Go — a 0.6B vision-language model that emits short-horizon navigation waypoints instead of scene descriptions, small...

A Push on Tactile Data, and a Warning That the Benchmark Was Leaking

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI