Elixir for a Bluesky DataPlane: the choice we didn't expect

s3cur31 pts0 comments

Elixir for a Bluesky DataPlane: the choice we didn't expect | bitcrowd blog

Skip to main content<br>Bluesky's source code is widely open source, so you can run your own social network with it. What's missing? A performant DataPlane implementation. Closing this gap would be an important step towards building digital independence. We wanted to contribute our share and decided to work on a performant DataPlane for Bluesky. When we started the project, we expected it to be a Go, Rust or even Node project. Instead, we landed with Elixir. Here is why and how we made that decision.

Four Languages, One DataPlane: How We Picked

We set out to evaluate Go, Rust, Node and Elixir for a from-scratch implementation of the Bluesky AppView DataPlane, fully expecting one of the usual suspects to win. The outcome surprised us. This is how we reasoned from the workload - and how we ended up somewhere we didn't anticipate when we started.

TL;DR - The DataPlane's workload splits cleanly in two: a hot path (timeline reads, served from memory, concurrency-bound) and a cold path (records, threads and profiles, I/O-bound). We expected Go, Rust or Node to win; Elixir fit best. Its one real weakness - raw per-core compute - is localized to the follower-graph set operations, which we offload to a small Rust NIF (Native Implemented Function). What's left is high-concurrency serving and a burst-absorbing fan-out queue, which the BEAM lets us build in-process instead of bolting on Redis or Kafka. The rest of this post is how we reasoned our way there.

The component nobody talks about​

Bluesky's infrastructure is, refreshingly, mostly open source. There's one notable exception: the DataPlane, a part of the AppView. It exists publicly only as a Node-and-Postgres reference implementation, while the production Bluesky network runs on a dedicated, ScyllaDB-backed, closed-source DataPlane (as documented across Jaz's blog and the Pragmatic Engineer's deep-dive on Bluesky's architecture).

That gap is exactly where things get interesting. The reference implementation tells you what the DataPlane does; it doesn't tell you how to make it survive contact with real traffic. If you want to run your own, you have to answer the scaling question yourself - and the first step is understanding the workload well enough to stop treating it as one thing.

The DataPlane is a central component of the AppView in Bluesky's architecture. Read theprevious postto learn more.<br>ScyllaDB is Bluesky's operational choice, not part of the interface. The DataPlane's contract is a gRPC service that answers high-volume, low-complexity queries and returns skeletons - lists of IDs, counts, booleans - which a higher layer later hydrates into full views. What sits behind that contract is entirely up to you: the language, and the datastore. So before picking either, we spent our time on the only thing that actually constrains the choice: the shape of the load.

A tale of two workloads​

Here is the central observation. The network's historical data is enormous - terabytes of it. But when a user opens the app and looks at their timeline, they almost never scroll back to the dawn of time. They read some tens of posts, get distracted and wander off to engage with a post, inspect a profile or follow a thread.

A note on specifics: it's been observed that Bluesky's timeline doesn't serve much beyond the last day or two of content, and that deeper cursor positions tend to fill with very recent posts rather than true history. Treat the exact window as illustrative unless you've measured it on your own deployment - the architectural point holds regardless of the precise number.

This creates a tension. Most “historic” data - meaning anything more than a few days old - is never looked at again in the timeline. When old data matters, it's almost always in a different context: someone inspecting a profile, or following up on a past thread. Yet to compile a timeline, the reference implementation has to perform joins against potentially terabyte-heavy tables. That neither performs nor scales well - and timeline requests are the lion's share of everything the DataPlane is asked to do.

Data volume and read frequency are inversely related: the vast bulk of data is old and almost never read, while the tiny sliver of recent posts drives nearly all timeline traffic.<br>The conclusion writes itself: timeline generation deserves a fundamentally different treatment from the use cases that retrieve individual records or threads. Conflating them is what makes the naive implementation hurt. Once you separate them, you find they don't just differ in degree - they have opposite resource profiles, and they want different things from the runtime underneath.

Hot path - timelinesCold path - records, threads, profilesData age Recent (last day or two)Historic (anything older)Data volume A tiny sliverTerabytesRequest share The dominant workloadComparatively rareBound by Memory + computeI/OLives in...

dataplane bluesky timeline implementation data elixir

Related Articles