MiMo Code: Scaling Coding Agents to Long-Horizon Tasks

English

简体中文

ProductMiMo Code

BlogJoin Us

English

简体中文

June 10, 2026MiMo Code: Scaling Coding Agents to Long-Horizon Tasks

MiMo Code is a terminal-based coding agent built by Xiaomi's MiMo team on top of OpenCode and open-sourced under the MIT license. It is designed for long-horizon automated programming tasks, with a core focus on how to maintain decision quality and state continuity over dozens or even hundreds of execution steps. This article introduces the core technical design of MiMo Code through three themes: computation, memory, and evolution .

1. Design Motivation The basic structure of a coding agent is to place a language model inside a runtime and call it in a loop: the model is responsible for reasoning and decision-making, while the runtime manages tools, persists state, and assembles the input for each round. The model itself is stateless—each call starts from a blank slate, and all continuity is provided by the runtime. For short tasks (typically fewer than 10 turns), this structure works well: simply passing the full conversation history to the model is enough, because the history itself serves as adequate working memory. But as the number of task turns increases, two problems gradually emerge. First, the context window will eventually be exhausted. No matter how large the window is, dozens of rounds of tool outputs, code snippets, and error logs will eventually fill it up. At that point, part of the history must be compressed or discarded. A common approach is to generate a summary to replace the discarded content. But simple compression continually reinforces nearby information while weakening distant information. This approach runs into an inherent dilemma similar to that of recurrent models such as Mamba: it has state, but cannot look back on demand. What we need is not better compression, but an explicit storage-and-retrieval mechanism that decides what information should be written into persistent structures, and when it should be recalled. Second, even if the context window is large enough, a model's instruction-following ability declines as input length grows. Useful constraints and intentions are diluted by large volumes of tool output, making it increasingly difficult for the model to extract what it should do next. We observed that the most prominent bottlenecks vary across different time scales: the quality of single-turn decisions within a session is mainly constrained by computation ; the continuity of multi-turn tasks within a session is mainly constrained by state management ; and improvement across sessions is mainly constrained by the mechanism for distilling experience . These three time scales correspond exactly to computation, memory, and evolution . MiMo Code is designed around these three themes. Figure 1: MiMo Code harness main-loop state machine 2. Computation: Scaling Single-Turn Reasoning When a task grows to dozens or even hundreds of steps, the error rate of each individual step compounds over time, while the agent often lacks external corrective signals during long-horizon execution. A direct response is to invest additional computation at different levels of granularity in exchange for reliability: reducing the probability of decision errors at the single-step level, preventing premature termination or directional drift at the task level, and reducing unnecessary back-and-forth overhead at the execution level. 2.1 Parallel Sampling and Selection (Max Mode) Max Mode generates N candidate solutions in parallel at each turn (N is set to 5 by default). Each candidate independently completes reasoning and tool-call planning, but does not actually execute the plan. The same model is then used as a judge to compare the reasoning process and action plan of all candidates, selecting the best one for execution. By default, temperature is set to 1, so five independent samples almost never produce identical results. If multiple candidates happen to converge, that itself indicates high confidence in that direction; when candidates differ significantly, having a low-temperature judge select the most robust plan is more reliable than depending on a single sample. On SWE-Bench Pro, Max Mode improves performance by 10–20% compared with single sampling, at the cost of roughly 4–5 times the token consumption. NOTEMax Mode is currently an experimental feature and must be enabled manually through configuration.

2.2 Independent Completion Verification (Goal) Max Mode addresses "doing it right"; Goal addresses "finishing it." A common failure mode in long tasks is that, after seeing prior progress in later turns, the agent tends to prematurely declare that it is "done" or ask a question. This is especially dangerous in automated execution, because there is no human standing by to correct it or provide feedback. The Goal mechanism works as follows: the user defines a natural-language stopping condition, such as "all tests pass and the code has been committed." Whenever...

MiMo Code: Scaling Coding Agents to Long-Horizon Tasks

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs