Hunting a 16-year-old SQLite bug with TLA+: is dqlite affected?<br>| Ubuntu
Your submission was sent successfully!<br>Close
Thank you for contacting us. A member of our team will be in touch shortly.<br>Close
You have successfully unsubscribed!<br>Close
Thank you for signing up for our newsletter!
In these regular emails you will find the latest updates about<br>Ubuntu and upcoming events where you can meet our team. e.preventDefault()">Close
Your preferences have been successfully updated. Close notification
Please try again or<br>file a bug report.
Close
Blog
This article was written by Marco Manino and Alberto Carretero, dqlite team at Canonical.
1. Anatomy of a SQLite bug
This month SQLite published a new version with a fix to a long-standing bug in the way that the Write Ahead Log (WAL) is checkpointed that leads to the corruption of the database.
The important aspect of this bug is not its real-world impact (which is very low) but how long it has been in the repository, how difficult it was to find it, and how difficult it was to reproduce it. Indeed the bug has been present since 2010, for 16 years! Also, the crucial question for us, the dqlite team, is: can dqlite be affected by this?
In order to find out, we first need to be able to understand the exact sequence of steps that leads to database corruption. To do that, we will be using TLA+ to model SQLite’s behavior and quickly find a trace that allows us to reason about the bug. Then, we will create a different model that describes how dqlite uses sqlite and we will check whether the bug can happen.
2. Small introduction to WAL and checkpoints in SQLite
SQLite uses WAL mode to allow readers to not be blocked by writers. The way it achieves that is by writing to a special staging area called the Write Ahead Log (WAL). Writers can append to the end of the WAL and readers can ignore the new data until it is stable. Eventually, the staging area is moved to the database; this is called a checkpoint. To prevent the WAL from growing indefinitely, a writer will attempt to “reset” it – i.e. overwriting it – if the previous checkpoint was able to move all the pages. If you are curious to learn more you can find a very clear description in the official documentation.
SQLite orchestrates changes to the WAL using locks and shared memory. For our use-case it is enough to think about writing and checkpointing; as such, we only care about two locks:
The checkpoint lock (CKPT_LOCK) which is taken before running a checkpoint to prevent multiple from happening at the same time
The write lock (WRITE_LOCK) which is taken before appending new pages to the WAL
The shared memory contains information needed to orchestrate writers, checkpointers, and readers, together with a data structure to index pages in the WAL for read performance. As we said, readers are not involved; as such, the only interesting fields are:
walSalt which contains a counter that is incremented each time the WAL is reset
mxFrame which contains the length of the WAL
nBackfill which contains the amount of pages that have been already checkpointed, that is, [nBackfill+1, mxFrame] has not been copied to the database at this point
3. Modeling the bug in TLA+
Part of the difficulty of writing TLA+ is deciding what to model and what not to model. We would like to come up with the simplest possible spec that is still faithful to reality and which we can use to extract useful insights from the model.
The first thing to model is our database and WAL as described in the previous section:
\* files.<br>VARIABLE wal<br>VARIABLE db
\* wal-index variables:<br>VARIABLE nBackfill<br>VARIABLE mxFrame<br>\* We will only capture the sequential part of the salt.<br>VARIABLE walSalt
Init ≜<br>\* wal and db are initially empty.<br>∧ wal = ⟨⟩<br>∧ db = {}<br>∧ nBackfill = 0<br>∧ mxFrame = 0<br>∧ walSalt = 0<br>Since generating the data is out of the scope of the model, we can take a simpler approach. We can model each page of data as a single unique number, then the WAL can be a sequence of such numbers and the database a set of them. In particular, checkpointing will move pages from the sequence in the WAL to the set of the database in the order they were appended to the WAL. To generate a unique number it is enough to use an always increasing counter.
We need to model the two actions that interact and produce the bug: appending and checkpointing. Let’s start by looking at the C code in SQLite that appends pages to the WAL to define our TLA+ action:
SQLite code responsible for appending frames to the WAL<br>static int walFrames(<br>Wal *pWal, /* Wal handle to write to */<br>int szPage, /* Database page-size in bytes */<br>PgHdr *pList, /* List of dirty pages to write */<br>Pgno nTruncate, /* Database size after this commit */<br>int isCommit, /* True if this is a commit */<br>int sync_flags /* Flags to pass to OsSync() (or 0) */<br>){<br>int rc; /* Used to catch return codes */<br>u32 iFrame; /* Next frame address */<br>PgHdr *p; /* Iterator to run through pList with. */<br>PgHdr...