ADHD: Parallel Divergent Ideation for Coding Agents
Introduction
A modern LLM, prompted with "give me a few ways to do X", will almost always produce the same three answers a senior practitioner would. This is not a bug at the token level — those are the high-probability completions — but it is a failure at the task level whenever the user's purpose is to escape the high-probability answer. We call this failure mode premature convergence : the model evaluates as it generates, the early tokens anchor the late tokens, and the output is the centroid of the training distribution dressed up as a recommendation.
Premature convergence is most costly in exactly the regimes where ideation matters most: architecture decisions, API and SDK design, debugging fuzzy intermittent failures, refactor planning, naming, positioning, and any task whose deliverable is a set of viable options rather than a single answer. In these tasks the textbook answer is often the trap, and the interesting answer lives in what the original divergent-ideation skill calls "the awkward middle, past the first three".[1]
Existing inference-time methods address adjacent problems. Chain-of-Thought (CoT)[2] makes one head reason more slowly along one path, exposing the intermediate steps so the model does not skip them. Tree-of-Thought (ToT)[3] makes one head search over candidate next-steps with backtracking. Self-consistency sampling[4] draws multiple traces and majority-votes. Mixture-of-Agents [5] and multi-agent debate [6] sample multiple full responses and aggregate. All four are valuable, but all four optimise for correctness on a closed answer space. None of them is shaped right for the open-ended case where there is no ground truth, no test you can run on a partial, and the metric of interest is range of non-obvious viable options.
We propose ADHD : a method that produces such a range by structurally preventing the generator from converging during divergence, and only converging in a separate, posterior critic pass. ADHD borrows the tree structure of ToT but replaces its branching driver (next-step search) with vantage-point reframing, and replaces ToT's intermingled generator/evaluator with two strictly separated LLM calls. The result, on the evaluations we report below, is a method that wins clearly against a single-shot baseline on novelty, breadth, and trap detection — the dimensions premature convergence destroys.
CoT makes one head think slower. ToT makes one head search wider. ADHD makes many heads think differently, in parallel, then has a critic pick.
Related work
Single-trace methods
Chain-of-Thought [2] elicits intermediate reasoning by prompting (or fine-tuning) the model to "think step by step". It is decisively useful on multi-step problems with verifiable answers (arithmetic, symbolic reasoning) but it is a single linear trace: each step is conditioned on the previous, which is precisely the anchoring dynamic ADHD is designed to break. Self-Consistency [4] samples many CoT traces and majority-votes the final answer; it improves robustness but assumes a discrete correct answer, which ideation does not have.
Multi-branch search methods
Tree-of-Thought [3] generalises CoT to a tree of intermediate "thoughts" with explicit search (BFS or DFS) and an evaluator function that scores partial states. ToT is the closest neighbour of ADHD, and ADHD can be described as a ToT variant. The differences are not cosmetic: (i) ToT's branches share a single conversational context so anchoring still occurs across steps, (ii) ToT's branching driver is next-step variation (try numeric value x vs y), which produces nearby ideas rather than structurally different ones, and (iii) ToT typically interleaves generator and evaluator within the same model call.
Multi-agent and aggregation methods
Multi-Agent Debate [6] has multiple instances critique each other across rounds; this can improve factuality but converges aggressively toward consensus, which is the opposite of what ideation needs. Mixture-of-Agents [5] stacks layers of LLMs that read each other's outputs; it improves quality on benchmarks but, again, the per-layer aggregation step is designed to converge. ReAct [7] interleaves reasoning with tool use, which is orthogonal to the ideation question we address.
Method-acting and persona prompting
A separate strand of work assigns the model a role — "you are an expert X" — to bias output style or domain knowledge. ADHD's cognitive frames superficially resemble this but differ in intent: frames are not chosen for expertise but for structural distortion. The "10-year-old" frame is not asked to be correct; it is asked to ignore convention. The "speedrunner" frame is not asked to be authoritative; it is asked to look for glitches. Frames are vantage-point operators, not credentials.
Source: the Divergent Ideation skill
ADHD operationalises a written skill on divergent ideation[1] that prescribes a divergence/convergence loop with...