I've Been Shipping 'Multi-Tenant' Wrong for a Decade | Adrià Cidre<br>I’ve been writing “multi-tenant” applications since around 2014. Each time, I told myself this one would be done properly. Each time, “properly” meant a tenant_id column on every table, a base repository that injected WHERE tenant_id = $1, and a strongly worded paragraph in the onboarding doc.
That is not multi-tenancy. That is tenant awareness, enforced by discipline and code review.
This is the first project where I think I’ve actually done it right. The thing that changed is not my discipline. It is where the contract lives.
What I used to call multi-tenant
The pattern I shipped, over and over:
A tenant_id column on every business table.
A base repository or ORM scope that automatically adds WHERE tenant_id = :current.
A request middleware that sets :current from the session.
A pull request template asking you to confirm new queries include the scope.
I have, at various points, written base repository decorators, ORM scopes, query interceptors, and SQL preprocessors that all tried to make this enforcement automatic. Each one was forty to four hundred lines of cleverness, and each one had a handful of places that bypassed it within the first six months: usually a raw SQL query somewhere, sometimes a primary-key lookup that the ORM treated as obviously-safe, occasionally a background job that loaded a record “by ID, because the ID is unguessable anyway.” Always an admin script.
It works until it doesn’t. The cases where it broke for me, sorted by how embarrassing the post-mortem was:
A new raw SQL query written outside the repository. Forgot the WHERE. Shipped.
A JOIN to a table that has its own tenant_id and a different alias. Scope applied to one side, not the other.
A background job that loaded a record by primary key, ignoring scope.
A reporting query that aggregated across all tenants on purpose, then got copy-pasted into a per-tenant endpoint a year later.
An admin script. Always an admin script.
An ORM find_by_id that bypassed the scope because the framework treated primary-key lookups as obviously safe.
In every case the application was tenant-aware. It knew about tenants. It usually filtered by tenant. But the database itself did not care. If a single query forgot, the rows came back wrong.
Every multi-tenant SaaS that has not moved the check into the database is one careless query away from a cross-tenant leak. I am not going to name the codebases.
Moving the contract into the database
PostgreSQL has Row-Level Security. It is not new. It has been in core since 9.5, which is 2016. I had read about it. I had not used it on a real project until now.
The shape of it is small. For every tenant-scoped table:
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders<br>USING (tenant_id = current_setting('app.current_tenant_id')::uuid);<br>That is the policy. From now on, every SELECT, UPDATE, DELETE against orders is scoped by the database to rows where tenant_id matches the value of the session variable app.current_tenant_id.
The application is no longer responsible for filtering. The application is responsible for setting one variable correctly per request. That is a much smaller surface to get wrong.
The request middleware, in Go, looks like this:
tx, err := pool.Begin(r.Context())<br>if err != nil { /* ... */ }
_, err = tx.Exec(r.Context(),<br>"SELECT set_config('app.current_tenant_id', $1, true)",<br>tenantID,<br>Two notes on what is not obvious there:
The query runs inside a transaction. set_config(..., true) is the parameterized equivalent of SET LOCAL, and SET LOCAL is transaction-scoped. If I set it on the connection instead, the value would leak to the next request that picks up the same pooled connection. I do not want to find that bug in production.
tenantID comes from a verified JWT claim, validated as a UUID before it hits the database. I did not want the UUID check to be in SQL.
The handler that runs inside this middleware uses the transaction directly. It does not write WHERE tenant_id = ? anywhere. That clause is enforced one layer down, by the database, on every query, with no way for the handler to opt out.
That is the headline. The rest of this post is the four things I got wrong before I got it right.
The four things I missed
1. The table owner bypasses RLS by default
I enabled RLS, wrote my policy, ran the test, and it passed. Then I ran a second test where the connecting user was the database owner, which is what my application used, and the policy did nothing. Every row came back.
PostgreSQL has a quiet rule: the role that owns the table is exempt from the table’s RLS policies unless you say otherwise. The fix is one line per table:
ALTER TABLE orders FORCE ROW LEVEL SECURITY;<br>I have a whole migration that does nothing but apply FORCE to a dozen tables I missed on the first pass.
2. Superusers bypass RLS even with FORCE
FORCE makes the owner respect the policy. It...