Looking Forward to Postgres 19: Checksums for All

Looking Forward to Postgres 19: Checksums For All Blog Home Looking Forward to Postgres 19: Checksums For All

Shaun Thomas|July 3, 2026

Data checksums are one of those Postgres features that, when they are doing their job, are easily forgotten. They sit quietly in the header of every data page as a small integer fingerprint, forever waiting to thwart the threat of cosmic rays or errant hardware failures. Most clusters run from cradle to grave and never trip a single one. For years, that decision was etched in stone at the time of database initialization. It wasn't until version 12 that Postgres introduced the pg_checksums utility to change it. And even then, doing so is a fully offline affair, grinding through every page on disk and incurring a long outage window. That's a fairly painful ordeal for a basic safeguard that wasn't even enabled by default until version 18. So why go through all the trouble in the first place? Do we really need data checksums in our Postgres cluster? The short answer is "yes". The longer answer explains why Postgres 19 continues to improve the checksum system by adding an online conversion capability. Bit Rot Never Sleeps Let's start with what a checksum actually defends against. Postgres is very good at protecting data from itself. Crash recovery, the write-ahead log, full page writes, all of that machinery exists to make sure a power failure mid-write doesn't leave a torn, half-updated page behind. But Postgres can’t help when the hardware itself lies about the data. Even with ECC RAM, that happens more often than you might expect. Cosmic rays can flip a bit in a memory cell. Failing drives may return stale sectors. Storage controllers could acknowledge writes that never made it to a platter. Any piece of the hardware, including the motherboard, CPU, and RAM, is suspect. In every one of these cases, Postgres asks for a page and the OS cheerfully returns something with no enduring validation. The data is just wrong, and nothing in the normal read path would ever know. A data checksum closes that gap. When checksums are enabled, every page written to disk carries a 16-bit checksum in the header computed from its contents. When that page is later read back into shared buffers, Postgres recomputes the checksum and compares. If the stored value and the computed value disagree, the page has changed underneath the database without the database's knowledge, and Postgres raises an error instead of returning garbage: ERROR: invalid page in block 4711 of relation base/16384/24576That error is the difference between discovering corruption upon interacting with an affected page, and discovering it months later when the rot has propagated into replicas and backups. The full details live in the Data Checksums chapter of the documentation, and the short version is this: checksums hang a cowbell onto previously silent corruption. But wait, there's more! Be Kind, Rewind The second, sneakier motivation for enabling checksums is for the sake of pg_rewind. When running a high-availability Postgres cluster, a switchover or failover event means the old primary node eventually gets repurposed as a replica. The easiest and fastest way to do this is to compare the state of the new and old primary, and "rewind" differences until they're compatible. But pg_rewind has a prerequisite: pg_rewind requires that the target server either has the wal_log_hints option enabled in postgresql.conf or data checksums enabled when the cluster was initialized with initdb (the default). full_page_writes must also be set to on, but is enabled by default. So checksums are one of two ways to satisfy pg_rewind. The other is a parameter called wal_log_hints. As a subject, hint bits are basically boring Postgres bookkeeping meant for optimization. The important part is that, as an "optimization" which can change during a read, they are not logged to the WAL by default. But as we all know, checksums are highly dependent on page contents. Any change to its bytes, including a flipped hint bit, also affects the checksum. So when checksums are on, Postgres has no choice but to WAL-log the full page—even for a humble hint bit update—so that recovery and replication stay consistent. Both roads lead to the same destination, which is why either one satisfies pg_rewind. It’s still easier to simply enable wal_log_hints if we only want rewind functionality, but checksums get us additional safeguards. With that in mind, how do we actually turn them on? Perhaps that process can explain why more DBAs avoid checksums than we would otherwise expect. Born at Creation Checksums have been available since Postgres 9.3, but only as a flag to initdb at the moment of cluster creation. Past versions of Postgres worked like this: initdb --data-checksums -D /var/lib/postgresql/dataOmit the flag, and the cluster remained checksum-free for life. That initial decision baked checksums into the data directory before we ever wrote a byte. Postgres 18...

Looking Forward to Postgres 19: Checksums for All

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI