Repositories Are Human/Agent Knowledge Factories – Sigplan Blog

rbanffy1 pts0 comments

Repositories Are Human/Agent Knowledge Factories | SIGPLAN Blog

Select Page

PL Perspectives

Perspectives on computing and technology from and for those with an interest in programming languages.

Repositories Are Human/Agent Knowledge Factories

by Peli de Halleux, Don Syme, Ben Zorn on Apr 21, 2026 | Tags: agents, AI, software development, software engineering

We argue that source repositories are no longer just containers for human-authored code — they are containers and generators of knowledge supporting the interaction of humans with AI agents. Taking this perspective demands that we rethink how we structure projects, write documentation, and more broadly how we rethink the entire software development process.  We draw on experience building a verification framework where ~90% of code, documentation, and tests were authored by AI agents under human direction and distill the structural patterns that made this possible.

Moving Beyond “AI-Assisted Coding”

The dominant model for AI in software development is the assistant: a human writes most of the code and occasionally asks a chatbot for help with a function, a regex, or a debugging session. Tools like GitHub Copilot, Cursor, and ChatGPT have made this workflow highly productive.

But a second model is emerging. In agent-driven development, AI agents are not assistants, they are the primary authors and maintainers of code, documentation, and tests. The human’s role shifts from writer to reviewer, from implementer to technical team lead. The agent reads the repository, understands its conventions, generates code that conforms to them, runs the test suite, and submits a pull request. The human sets the direction, reviews the diff, intervenes when things go awry, and is responsible for 360-degree oversight. Andrej Karpathy has described a similar trajectory in his “Software 3.0” vision, where natural language prompts and AI agents become the new programming interface, and the LLM functions as an operating system around which applications must be designed. Our framing of the repository as an interface for such an AI contributor is a natural extension of that vision.

This shift changes what a repository needs to be. When a human onboards to a project, they absorb conventions through code review, pair programming, Slack conversations, and institutional memory. An agent may not have access to these channels and may not be aware of institutional knowledge.  As a result, it is important to capture such information directly in files the repository.  If a convention isn’t written down in a place the agent can find and parse, the agent won’t have enough context.

This observation — that implicit project knowledge should be explicit and agent-accessible — has implications for how we structure repositories.   Repositories have always stored processes, code, data, and documentation. But now the content must be retargeted to support agent consumption.  It is, in essence, a knowledge representation problem.

The Knowledge Gap Problem

Consider what happens when an AI agent is asked to add a new component to an unfamiliar codebase. The agent needs to answer a cascade of questions:

What naming conventions does this project use?

Where do new components go in the directory structure?

What’s the error-handling pattern?

What tests need to be written, and where?

A human developer answers these questions through a combination of README skimming, code browsing, and asking teammates. An agent answers them by reading whatever documentation and instruction files are available in its context window. The gap between what the agent needs to know and what the repository explicitly tells it is the knowledge gap , and it can lead to agents taking incorrect actions, writing bad code, etc.

In our experience building AVA (Agent Verification Architecture)— a pipeline for verifying AI agents, itself built almost entirely by AI agents — we encountered this knowledge gap repeatedly in the early stages. Agents would generate code that was functionally correct but conventionally wrong: right behavior, wrong file location, wrong class naming pattern, wrong error-handling idiom. Each such error required a correction cycle, which in turn required the human to articulate the convention they had been carrying implicitly.

The solution was not to write better prompts. It was to write better repositories.

Structure for Machines, Review for Humans

The core principle we arrived at is deceptively simple: optimize the repository for alignment with AI agents while keeping it human-reviewable. This inverts the traditional priority. Most projects document for humans first and hope machines can figure it out. Agent-driven projects must document for machines first, knowing that humans can still read structured text perfectly well.

What does this look like in practice?  Here are some approaches based on our experiences:

Dimension<br>Human-Optimized<br>Agent-Optimized

Conventions<br>Tribal knowledge, code...

agent human knowledge code agents repositories

Related Articles