Optimization tales with CockroachDB: the slow logout

broken_broken_1 pts0 comments

Optimization tales with CockroachDB: the slow logout

Philippe Gaultier

Dark/Light

Body of work

Tags

Resume

⏴ Back to all articles

Published on 2026-06-30. Last modified on 2026-06-30.

Optimization tales with CockroachDB: the slow logout

SQL Optimization CockroachDB

Table of contents

The context

Investigation

But why?

First optimization: conditional write

Second optimization: read committed

One last hurdle: does the row even exist?

Final results

Conclusion

Quick question: What do you do when there is downtime at work? Read the news, tidy your inbox... Hunt for slow SQL queries?

I'm in the last bucket. Emboldened by my recent success in speeding up the password reset flow, where too many rows were scanned, I stumbled upon a query that looked so simple, yet was very slow and did thousands of retries, for an endpoint that is very heavily used... This piqued my interest.

As always, the work is open-source!

This is part 2 of my optimization adventures with CockroachDB. See part 1.

The context

150K retries, 500 ms statement time:

Contrary to part 1, this time, rows scanned and CPU time are completely fine. But there are way too many retries.

This query (and many variations of it, all suffering from the same issue), are executed when logging out a user. A user is logged out by setting active to false on their session row in the sessions table:

Sql

UPDATE sessions SET active = false WHERE token = ?

This is such a simple query. It uses the right index. And yet it misbehaves. Time to investigate!

Investigation

CockroachDB gives us a nice warning on the same page:

Plaintext

Error Code: 40001

Error Message: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1781712863.936725360,0 encountered previous write with future timestamp 1781712863.957518522,0 within uncertainty interval 't

If it looks like gibberish to you... Know that it did to me initially. Let's unpack it slowly:

TransactionRetryWithProtoRefreshError: the transaction was aborted by the server and the client was instructed to retry. We'll see why in a second. This is completely expected and normal behavior in CockroachDB in the default isolation level (Serializable).

ReadWithinUncertaintyIntervalError: read [...] encountered previous write: We tried to read the row containing the session token, in order to write to it, but we encounter a value with a higher (meaning: more recent) timestamp. So we conservatively have to restart from the top: re-read the fresh row data.

But why?

In a nutshell: we do an unconditional write to the row. It might not look like it because there is a WHERE clause. But this WHERE clause is actually only used to find the one row to update (using the session token). Once we have found the row, we write active = false to it every time. Even if active is already false! We do not check active at all. So two or more concurrent writes will all compete on the same row.

Due to the implicit transaction wrapping our update, using the default isolation level SERIALIZABLE (the strictest), we fall victim to transaction contention.

Quoting the docs:

By default under SERIALIZABLE isolation, transactions that operate on the same index key values (specifically, that operate on the same column family for a given index key) are strictly serialized. To maintain this isolation, SERIALIZABLE transactions refresh their reads at commit time to verify that the values they read were not subsequently updated by other, concurrent transactions. If read refreshing is unsuccessful, then the transaction must be retried.

But this is wasted work, because the first write that succeeds is enough: once we have marked the row as active = false, no code ever toggles active back to true, this is a final state. So all subsequent writes to this row should be no-ops, instead of retrying a number of times, and finally succeeding, having achieved nothing!

And then is heard no more: it is a tale told by an idiot, full of sound and fury, signifying nothing.

Macbeth, Act 5, Scene 5

First optimization: conditional write

So, let's make the write conditional: we'll only write to the row if active is true:

Sql

UPDATE sessions SET active = false WHERE active = true AND token = ?

There is no semantic change, and yet, this means way fewer writes (up to 1, now) and fewer retries. Why? Because transaction conflicts in CockroachDB happen in two main cases, write-write and write-read:

CockroachDB's transactions allow the following types of conflicts that involve running into a write intent:

Write-write, where two transactions create write intents or acquire a lock on the same key.

Write-read, when a read encounters an existing write intent with a timestamp less than its own.

We now avoid the write-write scenario, yay! We still have the other case (write-read) that can happen:

Transaction A starts

Transaction B starts

Transaction A writes to the row

Transaction A commits

Transaction B reads from the...

write read active transaction cockroachdb optimization

Related Articles