Learning Multi-Agent Coordination via Sheaf-ADMM
Learning Multi-Agent Coordination via Sheaf-ADMM
We introduce Sheaf-ADMM , a different way to build a neural network based on the notion of multi-agent consensus . The framework is built on the intersection of sheaf theory and ADMM for distributed consensus.
Resources
Paper
Code
Authors
Jeffrey Seely*
Sakana AI
Bartłomiej Cupiał*
Sakana AI<br>U. Warsaw<br>AKCES NCBR
Llion Jones
Sakana AI
Published
July 2026
Limited-view agents negotiating a global answer.
Introduction
AI systems are increasingly composed of many interacting agents rather than a single monolithic model. In current practice, multi-agent systems are typically centralized, such as with an orchestrator delegating and assigning subtasks. In many systems of interest, however, no such central coordinator exists. The nodes of a sensor network, the ants of a colony, or the neurons of a nervous system each observe a small part of the environment and communicate only with their neighbors, and coherent global behavior arises from these local interactions alone .
We wish to study the mechanisms of collective coordination. Our approach is to focus on the problem of distributed consensus — how multiple agents with individual views of data agree on a global state — and to look for inspiration in existing fields that have studied distributed consensus from different angles.
In distributed optimization, the alternating direction method of multipliers (ADMM) splits a global problem into per-agent subproblems; each agent solves its subproblem, reconciles its solution with neighbors, and repeats until the system reaches global consensus. The algorithm is rooted in the theory of convex optimization, but each step admits an elegant interpretation in terms of multi-agent coordination.
A complementary question is what neighboring agents must agree on. Full consensus is often too restrictive; an alternative is to ask agents to agree on only linear projections of their state. Incidentally, this coincides precisely with an object from applied algebraic topology: a network sheaf , which offers topological tools for studying distributed systems, such as harmonic states, topological obstructions to coordination, and more .
Both ADMM and sheaves offer complementary framings for local-to-global coordination. We develop Sheaf-ADMM , which utilizes both in a learnable system. ADMM supplies coordination and negotiation dynamics, and the sheaf structure supplies the notions of inter-agent agreement. In Sheaf-ADMM, coordination evolves by running an ADMM solver across the latent space of hundreds of communicating agents. No agent sees enough of the input to solve the task on its own. The global solution emerges only from the agents' local negotiation.
We establish the method in simple settings: image classification, multi-agent Sudoku, and maze pathfinding. By focusing on simple tasks, we are able to isolate these coordination mechanisms clearly, and make them amenable to investigation.
Paper: arxiv.org/abs/2605.31005
Code: github.com/SakanaAI/sheaf-admm
Sheaves
We first introduce the sheaf component of the framework. Typical message-passing neural networks (MPNNs) use arbitrary learnable nonlinear maps (e.g. MLPs) to pass messages between agents (i.e. between nodes of a communication graph) . Alternatively, a network sheaf gives a simple, linear, and—importantly—highly interpretable implementation of message passing .
The sheaf consensus condition. Two neighboring agents each hold a private state in $\mathbb{R}^2$. Restriction maps $F_{ij}, F_{ji}$ project both into a shared 1-D public communication channel. Consensus is reached when the projections agree, $F_{ij}x_i = F_{ji}x_j$.
In a sheaf, each agent $i$ holds a private state vector $x_i\in\mathbb{R}^d$ (a decision, or a latent representation). Two neighboring agents try to reach consensus — not by agreeing on their entire state vector, but only on a learned linear projection, $F_{ij} x_i = F_{ji} x_j$. Gradient descent on $\|F_{ij} x_i - F_{ji} x_j\|^2$ yields a message passing update:
$$x_i \leftarrow x_i - \eta\, F_{ij}^\top (F_{ij} x_i - F_{ji} x_j)$$
for each agent. Importantly, this update requires only knowledge of local state and pairwise prediction error ($F_{ij}x_i-F_{ji}x_j$) of adjacent agents. When cast across hundreds of agents with local connectivity, this amounts to a decentralized algorithm — sheaf diffusion — that iteratively updates each agent's private state vector to reach global consensus (defined as pairwise $F_{ij}x_i=F_{ji}x_j$ for all communicating agents).
Sheaf-ADMM
The Sheaf-ADMM architecture. A shared encoder maps each local view to a small convex problem; the ADMM layer alternates local optimization (x ), consensus via sheaf diffusion (z ), and dual accumulation (u ) for $K$ rounds; a shared decoder turns the final states into local predictions, fused into the global answer. Every step is differentiable.
Sheaf...