KinetIQ Ascend: Toward 100% Reliable Manipulation and Superhuman Speed

ashvardanian1 pts0 comments

KinetIQ Ascend: Towards 100% Manipulation Reliability and Superhuman Speed - Humanoid

KinetIQ Ascend: Toward 100% Reliable Manipulation and Superhuman Speed

Overview

At Humanoid, we build humanoid robots for real industrial work on production lines and in warehouses. A robot at a work station repeats its task thousands of times a day and earns its place only if it almost never fails while keeping the pace with the people doing the same job. This is why our target is 99.9% task success at human or superhuman speed.

Our robots learn manipulation end to end, mapping sensor readings directly to motion. Until now, we trained them the way the whole field does: by imitating human demonstrations. Imitation took us far, but it cannot carry anyone to industrial deployment: a model that copies demonstrations cannot exceed the demonstrator’s speed or quality, and it never learns the cost of failure. The last few percent of reliability, and every bit of superhuman speed, have to come from somewhere else.

Today we have line of sight to our reliability target . The path is reinforcement learning (RL): instead of copying demonstrations, we make the robot practice the task and learn from its own successes and failures, free of the demonstrator’s limits. Our approach, KinetIQ Ascend, extends KinetIQ, the AI framework behind our robots, with the ability to learn from trial and error. We run RL directly in the real world, on production tasks, around the clock, and we see the same predictable, compute-driven improvement that reshaped large language models: success rate climbs with training time, and our real-hardware runs are tracking the same trend that carries our simulation results to 100%. Everything we observe suggests that scaling robot time alone closes the remaining distance. To our knowledge, this is the first published demonstration of end-to-end, vision-based RL on production VLAs, trained on real bimanual humanoid hardware under true deployment conditions.

We demonstrate the power of our method on three production tasks from our portfolio. On industrial machine feeding, where the robot picks steel bearing rings out of a bin and places them on a conveyor, RL raised throughput by 42% . On item handover, where the robot picks an item from a cluttered tote and hands it to a person, throughput rose by 85% and success climbed from 80% to 98% , a tenfold drop in undesirable failures and a real step toward the reliability deployments demand. On bimanual tote handling, where the robot lifts a tote off a table with both arms, throughput more than doubled and success rose from 78% to 99% , a roughly twentyfold drop in failures. All results have been achieved with only days of robot time.

Two findings exceeded our expectations. First, applying RL to only the hardest part of a task improves the whole task : after RL the policy preserved the ability to perform tasks it didn’t practice. Second, training on a single object improves the general skill : the picking model improved on objects it never practiced during RL. These findings mean the gains spread beyond what we train on, making real-world RL far more practical than we expected.

The rest of the blog post gives more details on our journey to get these results: the infrastructure for around-the-clock training, the algorithmic problems RL hits on real VLAs and how we solved them, the issues with non-stationarity and exploration safety, and a deeper dive into the results.

Rationale

Industrial automation sets an unforgiving bar. To take over a station on a real production line, a robot has to repeat a task thousands of times a day, at human speed or faster, and almost never fail. The target we hold ourselves to is 99.9% success at human or superhuman speed, reached predictably enough to plan a deployment around it.

We have bet from the start on end-to-end deep learning to clear that bar. Vision-Language-Action models (VLAs) are the strongest architecture for it today, and they sit at the core of our stack. The dominant way to train them is behavior cloning (BC): collect demonstrations, then train a policy to reproduce them. BC has produced strong results, but on its own it runs into ceilings that keep it short of industrial deployment.

First, speed and quality are capped by the demonstrator. A policy trained on teleoperated demonstrations inherits the speed and the quality of the data it was trained on. The naive fix for speed is to replay the learned behavior faster, but physics gets in the way. Actuators have finite dynamics: a gripper closes at some maximum speed, a joint accelerates at some maximum rate. Run the policy faster and its timing assumptions stop matching reality, so the arm begins to retract before the grasp has closed. In our previous post we introduced a data-side "sport mode" that speeds the policy up at training time to address some of these issues. It is a heuristic, though, with inherent limits, and it fails if pushed too far. To go faster and...

speed real robot from superhuman task

Related Articles