Our Kubernetes Operator Didn't Scale, So We Rebuilt It

FinnLobsien1 pts0 comments

Our Kubernetes Operator Didn’t Scale, So We Rebuilt It27kRequest a demoGet started for free

27k starsRequest a demoGet started for free

← Back<br>Blog post • 7 min read<br>Our Kubernetes Operator Didn’t Scale, So We Rebuilt It

Published onThursday, June 25, 2026

Security is often at odds with convenience, but the human brain prefers convenience (and makes mistakes, even with the best of intentions). Most identity security tools reconcile this by making security as convenient as possible.

Any friction increases the risk that people build convenient workarounds or sidestep security tools and thus soften the organization’s security posture.

This is why we rearchitected our Kubernetes operator to improve performance and developer experience when we realized it struggled at scale.

Why we built a Kubernetes operator

One of the maxims of good secrets management is centralization. Uniting all secrets provides one place to store, manage, and audit secrets across your infrastructure.

Centralization requires syncing secrets into every type of infrastructure and deployment model. This means delivering secrets from the secret store into the user’s infrastructure to the service that consumes the secret. Ideally, this happens without workarounds or custom logic to avoid creating security gaps or placing maintenance burden on users.

Our first Kubernetes operator extended native secret syncs into distributed deployments. It worked, but didn’t scale well. As resources in a cluster or deployment proliferated, its memory footprint ballooned and performance degraded.

This is why we redesigned our operator with a new reference-based architecture.

Where our first Kubernetes operator faltered

Syncing secrets into Kubernetes at scale required our operator to do a few things:

Connect and authenticate to Infisical: Where the API is hosted, how to connect and authenticate to it.

Find the correct secrets in Infisical: Know the correct secret path within Infisical: which project, environment, folder, etc.

Enable pods and deployments to use them: Reconcile the secrets into Kubernetes-native secret objects so deployments and pods can get secrets into the correct environments.

Our initial design checked those boxes, but faltered as workloads increased.

Why v1’s architecture struggled at scale

In v1, users wrote monolithic custom InfisicalSecret resources that pointed at Infisical secrets. Each InfisicalSecret resource on the v1alpha1 API contained:

The address of the Infisical instance

The authentication credentials

The scope to pull from

The managed Kubernetes secret to write to

A resource looked like this:

apiVersion: secrets.infisical.com/v1alpha1<br>kind: InfisicalSecret<br>metadata:<br>name: service-a-secrets<br>spec:<br>hostAPI: https://app.infisical.com/api<br>authentication:<br>universalAuth:<br>credentialsRef:<br>secretName: universal-auth-credentials<br>secretNamespace: default<br>secretsScope:<br>projectSlug: my-project<br>envSlug: prod<br>secretsPath: "/service-a"<br>managedSecretReference:<br>secretName: service-a-managed<br>secretNamespace: default

Scalability suffered because each resource replicated authentication and connection. That architecture works on persistent infrastructure. A VPS or VM is one identity that only reauthenticates on restart or config changes. Kubernetes clusters, however, contain dozens or hundreds of these resources and frequently redeploy and restart. Because each carried its own auth and connection, each held its own independent client. This created three problems:

Resources consumed outsized memory because each resource held its own client in memory. At scale, it created out-of-memory issues that required raising Helm memory limits. Each time pods went OOM, each resource reconciled and authenticated at the same time.

Restarts produced a burst of simultaneous authentication calls, which ran into rate limits. The operator would succeed after backoffs and retries, but it created latency in getting clusters to a steady state.

Engineering teams had to do extra work. Rotating a machine identity or changing the Infisical host meant editing the authentication block on every single resource.

The fundamental issue was the overloaded CRD architecture, not missing logic. We evaluated event handlers, jitter, and other logic. Those may have helped, but added more complexity to an already overloaded CRD.

We could only solve the underlying issue with a new architecture.

How reference-based architecture fixed the replication

The new design separates connection, authentication, and sync. Secrets reference authentication and connection resources as shared objects. This fixes the performance issues and improves the developer experience.

We modeled the new architecture roughly on External Secrets Operator’s (ESO) resource split, which separates provider, store, and externalsecret CRDs. Infisical integrates with ESO, but we build our own operator for two reasons:

ESO has previously paused development. It has since resumed, but it’s not...

secrets operator kubernetes infisical scale architecture

Related Articles