The only scalable delete in Postgres is DROP TABLE

hollylawly1 pts0 comments

The only scalable delete in Postgres is DROP TABLE — PlanetScaleIntroducing Database Traffic Control™: resource budgets for your Postgres query traffic.Learn more

Navigation<br>Blog|Engineering<br>Table of contents «Close »Table of contents<br>Deletes hurt<br>Drop DELETE for DROP<br>A performant one-off delete<br>Postgres partitions for ongoing deletes<br>Go forth and DROP<br>PlanetScale Postgres is the fastest way to run Postgres in the cloud. Plans start at just $5 per month.<br>Learn more

Get the RSS feed

The only scalable delete in Postgres is DROP TABLE<br>Tom Pang | June 11, 2026<br>Counterintuitively, large DELETEs add work to the database.<br>From experience we can plainly claim the following: the most scalable Postgres data-deletion strategies revolve around deleting entire tables.<br>Individual row DELETE is fine at a small scale. However, big batch DELETE operations don't immediately free up physical disk space, add write and replication overhead, and are ultimately not good for large scale row cleanup.<br>If your application needs to delete large amounts of data, even very rarely, we recommend moving towards schema designs that let you express that as a DROP TABLE or a TRUNCATE.<br>Let's study why this is by looking at how DELETE works in Postgres.<br>Deletes hurt<br>When rows mutate, Postgres can maintain multiple versions of the same row, so that different transactions can see row values as of the time they were queried. This is Postgres' implementation of "Multi-Version Concurrency Control" (MVCC) and a core principle of its design.<br>Postgres makes an intentional tradeoff here. It stores modified and deleted rows alongside current ones, relying on transaction IDs and visibility maps to skip over "dead tuples." Later on, a vacuum process comes along and says, "Hey, these bytes in this heap page are now free, you can overwrite them."

Deletes also need to be fully replicated; they are still a work of writes, which means large-scale DELETEs can impact other writers to your application and cause them to wait for the DELETE replication to finish (under synchronous and semi-synchronous replication).<br>It's worth noting here that DELETE or even autovacuum doesn't typically return data to the operating system; they only say "the space in those pages can be written over." This is an intentional choice by Postgres. It optimizes for the case where DELETE workloads are mixed with INSERT ones, and releasing space to the operating system and then asking for it back is relatively expensive and should be avoided. VACUUM FULL allows for this, but takes an expensive lock for a long time.<br>Another related tradeoff Postgres makes is that index data is not touched at all when issuing a DELETE; instead, readers reading the index have to resolve "is this tuple dead." There's also a best-effort optimization where an index scan that finds a dead row can mark the entry as dead itself.<br>Overall, DELETE is really "work added," not "work done." If you want more details on Postgres MVCC, see Keeping a Postgres queue healthy.<br>If you're running a DELETE over a large amount of data, you can imagine how it adds work to every read query and autovacuum. Be aware that using foreign keys and CASCADE for deletions can cause a single row delete to delete gigabytes of data, resulting in the same set of problems.<br>Drop DELETE for DROP<br>In contrast, DROP TABLE and TRUNCATE require a heavyweight AccessExclusiveLock on the table, but are loosely independent of data size. At the physical layer they remove files from the operating system directly, plus sweep the Postgres buffer cache to remove pages related to the table.<br>That sweep can be less trivial on databases with large shared buffers, but it is only a metadata sweep. Postgres keeps a small fixed-size header (a BufferDesc, padded to 64 bytes) for every 8KB buffer, and dropping a table scans those headers, not the pages themselves. At 64 bytes per 8KB page, that's 1/128th of the cache size: with 128GB of shared buffers, you are sweeping only ~1GB of memory, sequentially, which is very fast on modern hardware.<br>DROP TABLE and TRUNCATE scale much better than DELETE. They produce zero dead tuples, zero vacuum debt, zero work for readers. They immediately free up space for the operating system.<br>A performant one-off delete<br>One common case where folks need to delete large amounts of data is "my table is full of junk due to a bug." We encountered this recently in an internal observability tool. A bug caused the tool to write millions of rows that we wanted to delete from the database. The bad rows had an old updated_at timestamp; anything with a recent one was designed to be kept. There were only a few hundred thousand rows to keep; most of the data was junk.<br>For this case, especially because "lock the database for minutes" was not an issue at all, we performed some surgery, leaning on Postgres' transactional DDL:<br>BEGIN<br>Explicit LOCK TABLE ... IN ACCESS EXCLUSIVE MODE on the table in question; this prevents other transactions from reading or...

delete postgres table drop data large

Related Articles