Continual Harness: A reset-free self-improving harness for embodied agents

Continual Harness — Online Adaptation for Self-Improving Foundation Agents

∞ / ∞

▶ A reset-free self-improving harness for embodied agents

CONTINUAL HARNESS

Online Adaptation for Self-Improving Foundation Agents

BLUE — cleared

YELLOW LEGACY (hard) — cleared

CRYSTAL — 0 KO

L98Seth Karten*1

L98Joel Zhang*2

L72Tersoo Upaa Jr1

L72Ruirong Feng1

L72Wenzhe Li1

L72Chengshuai Shi1

L99Chi Jin1

L99Kiran Vodrahalli3

* Equal contribution. 1 Princeton University · 2 ARISE Foundation · 3 Google DeepMind

A Paper

B arXiv

X Code

Y BibTeX

RESET-FREE SELF-IMPROVEMENT◆ HUMAN-OUT-OF-THE-LOOP◆ ONLINE PROCESS-REWARD CO-LEARNING◆ POKÉMON RED · EMERALD · BLUE · YELLOW · CRYSTAL◆ FRONTIER MODELS + GEMMA-4 OPEN-SOURCE STUDENTS◆ RESET-FREE SELF-IMPROVEMENT◆ HUMAN-OUT-OF-THE-LOOP◆ ONLINE PROCESS-REWARD CO-LEARNING◆ POKÉMON RED · EMERALD · BLUE · YELLOW · CRYSTAL◆ FRONTIER MODELS + GEMMA-4 OPEN-SOURCE STUDENTS◆

README

What is Continual Harness?

Coding harnesses such as Claude Code and OpenHands wrap foundation models with tools, memory, and planning, but no equivalent exists for embodied agents' long-horizon partial-observability decision-making. We first report our Gemini Plays Pokémon (GPP) experiments. With iterative human-in-the-loop harness refinement, GPP became the first AI system to complete Pokémon Blue , Yellow Legacy on hard mode , and Crystal without a lost battle . In the hardest stages, the agent itself began iterating on its strategy through long-context memory, surfacing emergent self-improvement signals alongside human-in-the-loop refinement.

Continual Harness removes the human from this loop: a reset-free self-improving harness for embodied agents that formalizes and automates what we observed. Starting from only a minimal environment interface, the agent alternates between acting and refining its own prompt, sub-agents, skills, and memory, drawing on any past trajectory data. Prompt-optimization methods require episode resets; Continual Harness adapts online within a single run.

On Pokémon Red and Emerald across frontier models, Continual Harness starting from scratch substantially reduces button-press cost relative to the minimalist baseline and recovers a majority of the gap to a hand-engineered expert harness, with capability-dependent gains. We then close the loop with the model itself: an online process-reward co-learning loop, in which an open-source agent's rollouts through the refining harness are relabeled by a frontier teacher and used to update the model, drives sustained in-game milestone progress on Pokémon Red without resetting the environment between training iterations.

DEMOS

The harness in motion.

A walking tour across Pokémon Red and Emerald — sub-agents, skills, online prompt optimization, long-context memory, gym battles, and bootstrapped auto-evolution. Clips are sped up for readability.

Sub-agent creation & delegation

Red. The harness spawns specialized sub-agents on the fly and delegates sub-tasks to them.

Skill creation & revision

Red. The agent writes a new skill, uses it, then revises it after observing the outcome.

Online prompt optimization

Red. The harness rewrites its own prompt mid-run, with no episode reset between iterations.

Memory unsticks a blocked route

Red. Long-context memory recognizes a previously-failed path and routes around it.

Route 102 — battling with sub-agents & replanning

Red. Combat sub-agents handle wild encounters while the planner replans on partial observations.

Pewter Gym — Brock

Red. The harness battles its way through the first gym leader.

Cerulean Gym — Misty

Red. Type-aware combat sub-agents take down the Cascade Badge fight.

Vermilion Gym — Lt. Surge

Red. Clearing the switch puzzle and the Thunder Badge battle.

Fixing & using a navigation skill

Emerald. The agent repairs its own navigation skill, then uses the fixed version in the field.

Objective planning

Emerald. Decomposing a long-horizon goal into sub-objectives the harness can act on.

Online refinement — first pass

Emerald. An early self-improvement loop: a refinement cycle to the harness in flight.

Refining a battle sub-agent

Emerald. The harness refines a dedicated battling sub-agent over a long-horizon run.

Continual refinement after a defeat

Emerald. Fixing the navigation skill, losing to the rival, writing memory of the defeat, then switching policy to switch-train the whole team.

Bootstrapped continual run

Emerald. End-to-end: navigating to the gym, the Wally battle, entering Mauville and recalling memory from a previous run, solving switches, and a double battle.

📡 LIVE

Gemini Plays Pokémon — live stream.

The human-in-the-loop precursor that motivated Continual Harness. Iterative harness refinement, long-context memory, and emergent strategy in real time.

SELECTSTART

CITE

Citation

Copy @article{karten2026continual, title={Continual Harness: Online Adaptation for...

Continual Harness: A reset-free self-improving harness for embodied agents

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play