We shrank our TimescaleDB chunks from 30 days to 7

Why we shrank our TimescaleDB chunks from 30 days to 7 | by WMG Lab | Jun, 2026 | WMG Innovation LabSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

WMG Innovation Lab

Stories from the front lines of music innovation. Warner Music Group’s engineering, product, design, infrastructure and data teams shipping at scale.

Why we shrank our TimescaleDB chunks from 30 days to 7

By Yask Srivastava

WMG Lab

4 min read· 2 days ago

Listen

Every day, Sodatone (WMG’s A&R intelligence platform) pulls engagement signals from streaming and social platforms and turns them into time-series that our scouts and label teams use to spot emerging artists. Most of that data lives in TimescaleDB hypertables, one per platform-and-metric pair. So when one of them starts misbehaving, it tends to be a leading indicator for the rest. If you haven’t lived inside TimescaleDB, here’s the short version. A hypertable looks like a single Postgres table, but under the hood it’s a collection of smaller tables — chunks — each holding rows from a time range. For example, a hypertable with a 30 day chunk that has a year’s worth of data is really 12 tables stitched together. This means if we wanted to query data from the last month to display to users in Sodatone, our query only touches the most recent chunk; all the other chunks are skipped without being read which drastically improves query times. Why chunk size matters Chunk size affects five key things that compound: Working set in memory. The active (uncompressed) chunk is what your hot writes and recent-data reads hit. If it doesn’t fit comfortably in shared buffers and the page cache, every recent query starts paying I/O. Chunk pruning . The query planner skips chunks whose time range doesn’t overlap with your WHERE predicate. That’s the main reason hypertables are fast for time-range scans — and smaller chunks make the pruning more selective on recent-data queries. Compression batch size. TimescaleDB’s compression policy compresses chunks once they pass a configured age. Bigger chunks take longer to compress and decompress than smaller ones. Backfill cost. Re-ingesting data into a compressed chunk means decompressing it, applying the change, and recompressing it. The chunk is the unit of that work. Retention granularity. If you ever apply add_retention_policy, the chunk is also the unit of eviction. TimescaleDB’s own guidance is that the active chunk should fit in roughly 25% of your available memory. That’s a moving target. As ingest rates grow, the same time interval represents more bytes, and a 30-day chunk that was fine a year ago can be a problem today. The thing worth knowing about set_chunk_time_interval is that it only affects future chunks. Existing chunks keep their original size and keep being queried just fine. There’s no rewrite, exclusive lock, or backfill. The hypertable transitions naturally as the next chunk boundary arrives. That makes this one of the safer knobs in TimescaleDB to turn. If you don’t like the result, you reverse it the same way. What we noticed Late last year, we noticed one of our heavier hypertables — millions of rows a week, multi-TB on disk before compression — wasn’t aging well. Compression was lagging behind ingest, recent-data reads got progressively heavier through the fall, and every time an upstream feed re-published a few days of history (which happens more often than we’d like), we ended up decompressing whole months of data to absorb the change. The chunk interval — which we’d set to 30 days back when the table was small — had stopped doing us any favors. What we changed In September the compression job on that table started failing — the chunk had grown too big for a single run to finish. That’s why this was the first table we touched. We dropped the chunk interval from 1 month to 7 days, and watched the job in the Timescale Cloud monitoring dashboard until it ran clean again. Get WMG Lab’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Two months later we saw the same failure on a different table — one of our music-chart-data feeds. The same fix worked. At that point we’d seen it twice in two months, so in early December we updated the rest of our hot platform-engagement tables in one PR. All went to 7-day chunks. Each migration looked like this: class ShrinkChunks What we got out of it Compression caught up faster. Smaller chunks finish a compression policy run more quickly, so the gap between “live” and “compressed” data shrank for every table we touched.

Recent weekly chunks for the music-chart-data feed table. Once TimescaleDB’s compression policy kicks in, each 7-day segment typically shrinks to about 10% of its original footprint.Backfills got cheaper. When an upstream feed re-publishes a week of history, we now decompress one 7-day chunk instead of dragging an entire month through a...

We shrank our TimescaleDB chunks from 30 days to 7

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy