One repo clone, shared forever

aryamanagraw2 pts0 comments

One repo clone, shared forever: how Falconer empowers agents and workflows with S3 Files | Falconer Notes

Back to Notes At Falconer we build agents that answer the kinds of questions that come up constantly in engineering teams: what does this code do, what changed recently, what did we decide in that meeting last week? Most of those questions we can answer from indexed documents and recent activity. But some questions are different.

“Who introduced this bug, and what were they trying to fix?” “What actually changed between the v1.2 and v1.3 releases?” These aren’t questions about recent context — they’re requests for time travel through code version history. Answering them well means giving the agent real git access, not a filtered API view. Getting there turned out to be a surprisingly fun infrastructure problem, and we shipped the full solution in about six weeks. Here’s what we built, why we built it that way, and what we learned along the way.

EBS and the GitHub API both fall short

Before this project, Falconer’s background pipeline ran on ECS tasks backed by Elastic Block Store (EBS) volumes. Every time a job needed to process a customer’s codebase, it cloned the repo fresh and discarded the clone afterwards. Fast, simple, and wasteful.

Exposing git tools to Falcon agents is not straightforward. When a user asks “what has changed in our repo this week?”, a cold git clone on a fresh EBS volume can take ten seconds for a small repo and many minutes for a large one. That latency is unacceptable in a conversational context. EBS volumes are also bound to a single instance, so you either pin customer traffic to specific workers or you’re constantly managing EBS snapshots and copies.

The natural next question was whether we even needed a clone at all. The GitHub APIs cover a lot of ground, and proxying git operations through them would avoid the storage problem entirely. It works for simple cases, but falls apart quickly under real agent workloads. Rate limits bite when an agent issues a burst of exploratory queries in a single conversation — and with a shared installation token the budget drains fast. More fundamentally, the GitHub API does not expose the full git surface: there is no equivalent of git blame with line-range precision, no git log -S pickaxe search, no git diff between arbitrary refs with rename detection. The agent would be working with a constrained, mediated view of history rather than the real thing.

As a small startup with under ten people, we also had a hard constraint: whatever we built had to be low-maintenance. The solution had to run itself. What we needed was a shared, persistent filesystem — one that the ingest service could write to and the UI service could read from without owning a clone of its own. The ingest service would do the heavy lifting once, and every subsequent read, whether from a background job or a live agent conversation, would hit the same pre-populated tree. We needed a shared, multi-mountable filesystem — specifically, one that implements the Network File System (NFS) protocol.

Multi-Mount File SystemSingle Mount Block StorageUI Service FleetIngest Service FleetUI Service FleetIngest Service FleetUI container (agent)

UI container (agent)

UI container (agent)

ingest container

ingest container

ingest container

shared file system<br>/repos/{orgId}/...

UI container (agent)

UI container (agent)

UI container (agent)

ingest container

ingest container

ingest container

EBS volume 1<br>/repos/{orgId}/...

EBS volume 2<br>/repos/{orgId}/...

EBS volume 3<br>/repos/{orgId}/...

read/writeread/writeread/writewritewritewritereadreadread<br>From per-container EBS clones to a single shared NFS filesystem

S3 Files vs. Elastic File System

Our first stop was Amazon EFS. It supports NFS v4 and works well with ECS containers. However, a friend working at AWS told us about the newly launched S3 Files, a new AWS offering that presents an NFS v4.1 interface over an S3 bucket. We benchmarked EFS against S3 Files across clone time, find, and ripgrep on a 28,000-file, 343 MB Next.js repository. These are deliberately punishing access patterns, far more aggressive than typical git operations, but we wanted to understand the worst case. EFS and S3 Files landed within ten percent of each other on every test, which is not a coincidence: S3 Files is not a FUSE layer over S3 object APIs. It is a real NFS server that uses EFS as a high-performance caching layer, with S3 as the durable backing store and source of truth. Your data never leaves S3 — EFS just accelerates access to the active working set.

OperationEBSEFSS3 Filesgit clone (343 MB)10.5s9m 59s9m 33sfind *.js cold0.51s58.5s47.2sfind *.tsx warm0.44s13.3s13.1srg "function" cold2.3s2m 01s1m 55srg "use strict" warm1.5s32.8s37.2s<br>The cold/warm gap also makes sense once you know this: file metadata and smaller files are lazily loaded into the EFS cache on first access, keeping subsequent reads fast. Large reads (≥1 MiB) bypass the...

container agent clone ingest shared files

Related Articles