Optimization tales with CockroachDB: the slow password reset flow

broken_broken_1 pts0 comments

Optimization tales with CockroachDB: the slow password reset flow

Philippe Gaultier

Dark/Light

Body of work

Tags

Resume

⏴ Back to all articles

Published on 2026-06-29. Last modified on 2026-06-30.

Optimization tales with CockroachDB: the slow password reset flow

SQL Optimization CockroachDB

Table of contents

The context

Investigation

Statistics

Query execution

Why?

First optimization: provide 'via'

Second optimization: single region

Third optimization: only query needed columns

Final results

Things I have tried to little effect

What about other databases?

Future avenues

Conclusion

It all started one morning. I opened Slack as usual to start my working day, only to find a message from the CTO:

Hello this query reads like 700k rows

To which engineers replied:

The plan looks ok IMHO, indices are being used:

[copy pasted production plan]

And the CTO answering:

This query should not read 700k rows, but like 10 or something.

And spoiler alert, he was dead right. I started to look into it and this turned out much more interesting than just 'it was a full table scan, I added the right index and moved on'.

As always, the code, and fix, are open-source!

This query actually runs against all 4 databases we support (SQLite, PostgreSQL, MySQL, CockroachDB), but here I will focus on CockroachDB because that is the database that we run in production and because it has important differences, which make this investigation interesting.

This is part 1, there will be more SQL optimization stories in the same vein.

The context

The software is Kratos, a widely used authentication and identity management service. Users of this software are humans. Humans often register with an email and password (Kratos also supports passwordless schemes such as passkeys, WebAuthn, etc, but the proverbial email+password approach remains very much in use). Humans also tend to forget their password. That's why Kratos like any identity management service worth its salt, supports password recovery.

The user enters one of their addresses (email, phone number, etc), a list of their masked addresses is shown to them, they pick one, and a recovery link or code is sent to them on that address. Using that link or code, they can setup a new password. Pretty standard:

The table looks like this (showing only relevant fields):

Sql

CREATE TABLE public.identity_recovery_addresses (<br>id UUID NOT NULL,<br>via VARCHAR(16) NOT NULL, -- 'email' or 'sms'<br>value VARCHAR(400) NOT NULL, -- 'foo@bar.com' or '+49123456789'<br>identity_id UUID NOT NULL, -- the identity (i.e. account) id<br>nid UUID NULL, -- the tenant id

CONSTRAINT identity_recovery_addresses_pkey PRIMARY KEY (identity_id ASC, id ASC),<br>UNIQUE INDEX identity_recovery_addresses_status_via_uq_idx (nid ASC, via ASC, value ASC),<br>UNIQUE INDEX identity_recovery_addresses_id_key (id ASC)<br>);

This is done with one SQL query (slightly simplified from the real one):

Sql

SELECT *<br>FROM identity_recovery_addresses AS a<br>JOIN identity_recovery_addresses AS b<br>ON a.identity_id = b.identity_id<br>AND a.nid = b.nid<br>WHERE b.value = ?<br>AND a.nid = ?<br>LIMIT 10

Kratos supports multi-tenancy, so each tenant has an id called nid, each row stores nid, and each query clause contains WHERE nid = ? to isolate each tenant. But this is a non factor: we know the tenant id from the start since each tenant has its own subdomain(s), so for this query, the nid is effectively a constant.

The approach is relatively straightforward with a self-join, that can be understood as two queries:

Given the tenant id (nid) and the provided address, for example foo@bar.com, find the identity (i.e. the user account) for it.

Now that we have the identity id, find all addresses for that identity (ON a.identity_id = b.identity_id):

Sql

SELECT *<br>FROM identity_recovery_addresses AS a<br>JOIN identity_recovery_addresses AS b<br>ON a.identity_id = b.identity_id<br>AND a.nid = b.nid<br>WHERE b.value = 'foo@bar.com'<br>AND a.nid = '000e377a-062c-45b1-961c-1b28d682df6a'<br>LIMIT 10

Return the list of addresses for that identity, up to 10, because we do not expect a user to have more than a handful.

Now, Kratos can show the list of masked addresses e.g. +15234****56 if it's a phone number, or foo@****.com if it's an email address. The masking logic is pretty smart so accidental information disclosure is avoided. Kratos also pretends to send the recovery link/code to a non-existing address, so that it's not possible for an attacker to probe a website for certain addresses. The last point can actually have real life consequences in certain countries for certain websites, e.g. LGBT ones.

(Always remember that your code can impact lives).

Anyways, I am the one who actually wrote this query some months ago and I remember confirming in production that the query plan was sensical.

Now, the query is impacting the application and the database with its bad performance. It's not clear to me if:

the performance was always subpar and no one noticed, or

some...

query optimization password identity_id identity cockroachdb

Related Articles