Ordered Key Sharding in DynamoDB

Ordered key sharding in DynamoDB - death and gravity

Ordered key sharding in DynamoDB

June 2026 ∙ eight minute read

PyCoder's Weekly HN Bluesky Reddit --> linkedin Twitter

So, you want to keep a sorted index in DynamoDB, but for whatever reason – usually throughput-related – it won't fit on a single partition . What do you do?

Today, we look at the available solutions, do the math, and find out which is best.

Tip

This worked example is part of my DynamoDB crash course series.

Contents

Requirements

A sparse index is almost enough

But scan results are not ordered

But a single partition key causes throttling

But random suffixes are random

But hash suffixes are not ordered

But there are a lot of first characters

But some first bytes need multiple shards

But tries and prefix ranges are complicated

But the prefix distribution can change

Requirements #

Say you're using single table design with a table of artists, albums, and songs.1

You keep an artist's items in a single collection (aka same partition key), and use sort keys artist, album#{Album}, and song#{Album}#{Song}, depending on their type:

# table Music (partition key: Artist, sort key: sk) Solar Fields: !btree 'album#Leaving Home': { Album: Leaving Home, ... } 'artist': { ... } 'song#Leaving Home#Air Song': { ... } 'song#Leaving Home#Monogram': { ... }

To list albums without doing a full table scan, you need a global secondary index.

Let's come up with some reasonable requirements; the GSI should support:

items up 500 bytes (we project additional attributes besides the keys)

10,000 queries/second, max 100 items/query, sorted alphabetically list all albums

list albums by title

10,000 writes/second (to avoid write throttling during imports)

A sparse index is almost enough #

One way to do it is to use a dedicated sparse index, taking advantage of the fact that items with missing index keys don't appear in the index.

If only albums have an Album attribute, we just create a new GSI:

# GSI Albums (partition key: Album, sort key: Artist) Leaving Home: !btree International Pony: { sk: 'album#Leaving Home', ... } Solar Fields: { sk: 'album#Leaving Home', ... } Heavy Migration: !btree Dday One: { sk: 'album#Heavy Migration', ...}

If songs have an Album too, we add a dedicated AlbumsPK attribute instead.

In many ways, this is the ideal solution. To list all albums, we scan the index. To list albums by title, we query an index partition key. We have lots of unique partition keys with items spread pretty evenly across them, which should prevent throttling.

But scan results are not ordered #

...except scan results are not ordered, so we're missing the sorted alphabetically part.

What is ordered are sort keys, so we can use a single index collection instead:

# GSI GSI1 (partition key: gsi1pk, sort key: gsi1sk) 'albums': !btree Heavy Migration: { Artist: Dday One, sk: 'album#Heavy Migration', ... } Leaving Home: { Artist: Solar Fields, sk: 'album#Leaving Home', ... } Leaving Home: { Artist: International Pony, sk: 'album#Leaving Home', ... }

This is also seemingly ideal. To list all albums, we query the entire index partition key. To list albums by title, we use a sort key. The results are sorted as required, and there's no limit on the number of items in a collection.

But a single partition key causes throttling #

However, there are per-partition limits of 24 MB/s for reads and 1 MB/s for writes.

Let's see how they compare to our requirements:

reads: 500 bytes/item * 10k queries/s * 100 items/query = 500 MB/s (~21x )

writes: 500 bytes/item * 10k items/s = 5 MB/s (5x )

Uh-oh, turns out we need 21 times the throughput one partition can deliver.

One way to spread the load is sharding, using multiple synthetic partition keys of the form album#{shard_id}. A common option for the shard id is a random number from a known range, e.g. album#{randrange(21)}:

# GSI GSI1 (partition key: gsi1pk, sort key: gsi1sk) 'album#1': !btree Leaving Home: { Artist: Solar Fields, ... } 'album#12': !btree Heavy Migration: { Artist: Dday One, ... } 'album#20': !btree Leaving Home: { Artist: International Pony, ... }

To list all albums, query each shard in turn:

for shard in range(21): for item in dynamodb.query(f"album#{shard}"): yield item

But random suffixes are random #

There's a problem, though – with random shard ids we can't easily list albums by title, since albums with the same title may end up on any shard.

A better option is to calculate the shard id from the album title using a hash function:

def hash(s): return int.from_bytes(sha256(s.encode()).digest())

def album_shard_id(album_title): return hash(album_title) % 21

# GSI GSI1 (partition key: gsi1pk, sort key: gsi1sk) 'album#6': !btree Leaving Home: { Artist: Solar Fields, ... } Leaving Home: { Artist: International Pony, ... } 'album#8': !btree Heavy Migration: { Artist: Dday One, ... }

To list albums by title:

dynamodb.query(f"album#{album_shard_id(album_title)}",...

Ordered Key Sharding in DynamoDB

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi