JIT Provisioning on Cloudflare Containers

Henrik7162 pts0 comments

Cold Start to First Response: JIT Provisioning on Cloudflare Containers | Waystones Cloud<br>Back to JournalEngineeringCloudflareContainersPerformance<br>Cold Start to First Response: JIT Provisioning on Cloudflare Containers<br>2026-06-01 Waystones Team 6 min read

Waystones Cloud migrated from Fly.io microVMs to Cloudflare Containers about a month ago. The Fly.io architecture had one fundamental advantage: machines knew who they were at boot. Environment variables carried the tenant config. The machine woke up with an identity.

Cloudflare Containers don't work that way. They boot as generic, blank images. Identity arrives with the first HTTP request. This is what we built to handle that, and what broke along the way.

The 5KB Wall and the Blank Slate Problem

Our first attempt at provisioning was straightforward: inject the pygeoapi YAML config and R2 credentials as environment variables at container start. Cloudflare enforced a hard 5KB limit on environment variables. Our config — collections, layer definitions, credentials, metadata — blew past that immediately. The container was killed before it served a single request.

The deeper problem was architectural. Even if the limit didn't exist, Cloudflare Containers boot as a shared base image. They don't know which tenant they belong to until traffic arrives. Env vars at start time solve the wrong problem.

Fix: JIT header provisioning. The Cloudflare Worker handling inbound traffic packs the pygeoapi YAML config as base64 into X-Waystones-Config-B64 and the R2 credentials as JSON into X-Waystones-Config. These headers arrive with every request. The container provisions itself on first contact.

The WSGI interceptor sits in front of pygeoapi's Flask app and catches this:

def application(environ, start_response):<br>global _CONFIG_LOADED, _pygeoapi_app

if not _CONFIG_LOADED:<br>with _lock:<br>if not _CONFIG_LOADED:<br>raw_config = environ.get("HTTP_X_WAYSTONES_CONFIG")<br>if raw_config:<br>_inject_machine_env(raw_config)

b64 = environ.get("HTTP_X_WAYSTONES_CONFIG_B64")<br>if b64:<br>config_bytes = base64.b64decode(b64)<br>tmp = CONFIG_PATH + ".tmp"<br>with open(tmp, "wb") as f:<br>f.write(config_bytes)<br>os.replace(tmp, CONFIG_PATH)<br>open(_TENANT_FLAG, "w").close()<br>...

os.replace() is an atomic rename. The tenant flag file signals to other processes that real config is on disk. Flask never sees a partial write.

The Worker Coordination Problem

Gunicorn runs two workers — separate OS processes, isolated memory, shared disk. A browser opening the map fires multiple concurrent requests. Worker 1 catches the first, starts writing config. Worker 2 catches the second a millisecond later.

The threading lock inside each worker guards against thread-level races within that process. But between workers, both processes independently enter the initialization path.

This is handled by three things in combination:

os.replace() — atomic at the filesystem level. Both workers writing identical config bytes is harmless.

_TENANT_FLAG — once written by whichever worker gets there first, the other worker's next check sees it and skips the header path entirely, loading from disk instead.

_CONFIG_LOADED + _lock — double-checked locking within each worker ensures Flask loads exactly once per process.

The second worker path in the interceptor:

elif os.path.exists(CONFIG_PATH) and os.path.exists(_TENANT_FLAG):<br>print(f"[waystones_wsgi] Using existing config at {CONFIG_PATH}", flush=True)<br>_ensure_openapi_ready()

No sleep timers. No polling. The flag file is the cross-process signal.

The OpenAPI Catch-22

pygeoapi requires an OpenAPI document to serve its collections index. Generating that document requires the config. The config doesn't exist until the first request. The first request needs the collections index to succeed.

The interceptor resolves this synchronously before handing off to Flask:

def _ensure_openapi_ready() -> None:<br>if not _is_stub_openapi():<br>return

# Fast path: pull from R2 cache<br>subprocess.run(["python3", "/cache_openapi.py", "--download-only"], ...)

if not _is_stub_openapi():<br>return

# Slow path: generate synchronously<br>tmp = OPENAPI_PATH + ".tmp"<br>with open(tmp, "w") as f:<br>subprocess.run(["pygeoapi", "openapi", "generate", CONFIG_PATH], stdout=f, check=True)<br>os.replace(tmp, OPENAPI_PATH)

Fast path: the Waystones Cloud backend pre-generates the OpenAPI document from model.json at deploy time — a ~2ms TypeScript function — and uploads it to R2. The interceptor downloads it in ~100ms and skips Python generation entirely.

Slow path: generation runs synchronously and blocks the first request. This only happens on a cache miss. Once generated, a background task uploads it to R2 so the next cold start hits the cache. Atomic rename again — Flask either sees the old stub or the complete document.

The GDAL Blackhole

QGIS Server is a separate container, fronted by its own Python proxy. Same JIT pattern: X-Waystones-Qgis-B64 carries the QGIS project file as base64, X-Waystones-Config carries the R2...

config first worker path cloudflare waystones

Related Articles