SlateDB: An Object-Native LSM for Online Systems

SlateDB: An Object-Native LSM for Online Systems — SlateDB

docs blog discord github ★3.2k

“Diskless” systems that delegate durability to object storage are the future of database systems:

The economics are phenomenal. Storage is priced at a fraction of block storage or NVMe and inter-AZ data transfers are offered at $0.

Object storage handles replication and provides 99.999999999% durability, handling the most notoriously challenging problem in distributed systems.

Once data is written to Object Storage, it can be read by an arbitrary number of readers without any additional ETL pipelines.

There are hundreds of competent engineers dedicated to keeping it online and highly available, despite the low unit economics.

These properties have made object storage standard for offline workloads while recent successes (e.g. Turbopuffer, Warpstream, Quickwit etc…) demonstrate its potential for online systems.

To accelerate the adoption of object storage for online systems, we’ve spent the last few years building SlateDB: an OSS object native LSM implementation with an embedded key-value interface.

What is SlateDB?<br>SlateDB is an embedded key-value store, implemented as an LSM tree, that<br>depends on object storage alone for durability. It supports:

transactions and snapshot isolation

native single-writer, multi-reader deployments

rescaling via unions and projections

checkpoints & forks

pluggable, distributed compaction

Despite never being officially “announced” until today, SlateDB is already used in production by Dropbox, ZeroFS, HelixDB, Opendata and others.

If you prefer getting your hands dirty to reading a blog, SlateDB is available today with bindings in Rust, Go, Java, Node and Python.

use slatedb::Db;

// open a SlateDB instance backed by an object storage bucket

let slate = Db::open("/dir", object_store).await?;

// use SlateDB as a key value store

slate.put(b"key", b"value").await?

slate.get(b"key").await?;

Object Storage Laws of Physics

Despite the numerous benefits, object storage has three characteristics that have hindered its adoption for online workloads:

Request latencies are an order of magnitude slower (50-100ms) than typical online systems

Every GET and PUT request are individually metered at ~$0.40/million reads and ~$5/million writes

Objects are immutable and can only be entirely overwritten

A naive system that uses S3 directly as a key-value store for a modest 10K ops/sec split even between reads and writes would cost $70K/mo and perform poorly.

To solve this, object native systems batch writes and cache reads.

For writes, the available tradeoff is between latency, cost, and durability. If you require writes to be durable then you may either issue more frequent PUT requests to drive down the latency or save costs by batching a window of writes into less frequent request. If you can risk losing data you may acknowledge writes eagerly and still batch together many into a single PUT.

╭─────────────────────────────────────────╮ ╭─────────────────────────────────────────╮

│ ◎ ○ ○ ░░░░░░░ Pick Two (Writes) ░░░░░░░░│ │ ◎ ○ ○ ░░░░░░░░ Pick Two (Reads) ░░░░░░░░│

├─────────────────────────────────────────┤ ├─────────────────────────────────────────┤

│ │ │ │

│ ┌─────────────────────────────────┐ │ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │

│ │● LATENCY │ │ │ ○ LATENCY │

│ └─────────────────────────────────┘ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │

│ ┌─────────────────────────────────┐ │ │ ┌─────────────────────────────────┐ │

│ │● DURABILITY │ │ │ │● CONSISTENCY │ │

│ └─────────────────────────────────┘ │ │ └─────────────────────────────────┘ │

│ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ ┌─────────────────────────────────┐ │

│ ○ COST │ │ │● COST │ │

│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ │ └─────────────────────────────────┘ │

│ │ │ │

└─────────────────────────────────────────┘ └─────────────────────────────────────────┘

The pick-two tradeoffs that govern writes and reads on object storage.

For reads across multiple replicas, the equation trades between latency, cost and consistency and boils down to how you handle your cache. If you need low latency, consistent readers then you must pay to actively replicate your writes between machines and invalidate caches (either via frequent GET requests or via network calls). If you can accept eventual consistency, then you can serve data from your stale cache until a certain poll interval where you batch GET from object storage to reduce costs.

Any online system that builds on object storage is subject to these “laws of object physics.”

Why LSMs for Object Storage

The two laws of object physics map nicely onto mechanisms for maintaining two data structures: logs and sorted arrays.

Trading off between latency, durability and cost is easily projected onto a log where the lever is how often you flush the tail of the log to object storage. The problem with logs is that queries need to scan the entire...

SlateDB: An Object-Native LSM for Online Systems

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level