Aurora DSQL and the Circle of Life

In this article we’re going to take a deeper look at the Circle of Life in Aurora DSQL. Understanding the flow of data will really help you wrap your head around the DSQL architecture, and how best to build applications against DSQL.

My intention in sharing this article is help you understand the flow. There are many other things to understand: availability, scalability, durability, security, and so on. I won’t be discussing those in detail, because each of those topics is deep and complex, and deserves its own focus.

The flow of data

Aurora DSQL is based on PostgreSQL. The Query Processor (QP) component is a running postgres process, although we’ve significantly modified it to work with our architecture.

If you connect to DSQL with a Postgres client (such as psql), you’re connected to one of these postgres processes. You’re connected to a QP, and you can start to interact with it as you would with any other Postgres server.

If you run a local Postgres operation, such as SELECT 1, then that query is processed entirely locally by the QP. But what happens when you query a table:

select * from test; id | value ----+------- 1 | 10 (1 row)

Usually, you’d expect Postgres to read from storage locally, which might mean reading from the buffer cache, or doing disk I/O. When running on Aurora Postgres (APG), cache misses would result in a load from remote storage.

Like APG, reads in DSQL also go to remote storage. In our above query, which is a scan of the entire test table, the QP is going to turn around and scan storage, and storage is going to return all the rows in the table.

But how did storage get the rows in the first place?

insert into test values (1, 10); -- autocommit

Vanilla Postgres would process that transaction locally, inserting into the Write-Ahead Log (WAL), updating the buffer cache, and using fsync() to persist the changes to disk. In APG, the buffer cache is also updated, but the durability of the transaction is ensured by fsync() to the remote storage in multiple Availability Zones (AZ).

Commits in DSQL have the same basic ingredients, but they’re expressed quite differently. In DSQL, data is durably persisted when it’s written to the journal1. Storage follows the journal, and keeps itself up to date.

When I first started working on DSQL (many years ago!), I didn’t really get this flow. I’d been told “writes go to the journal, reads go to storage”. I nodded, but I didn’t deeply, truly, understand that simple explanation. I’d spent too much time with traditional architectures, and my mind kept falling back on the familiar.

What helped me get it was the picture at the top of this post. Imagine somebody drawing this on the whiteboard. They draw the three boxes: QP, journal and storage. Then, they draw the Circle of Life:

There’s something about this presentation, vs. the one at the top of the article, that helped it go click for me. Removing the service interactions certainly helps. Notice how in the first picture there’s an arrow from the QP to storage, while in the Circle, the arrow is the other way round?

“Writes go to the journal, reads go to storage”, never quite did it for me. “Writes never go to storage, reads never go to the journal” also didn’t quite the message across. The Circle did for me, and I hope it does for you.

The flow of time

Now you may be thinking: How do we know that storage is up to date? We’ve just inserted our (1, 100) tuple and got a successful commit. Then, we run our table scan. What if storage isn’t up to date? What if there’s some kind of delay on the network, preventing storage from learning about the new row?

The change is trying to reach storage, but it’s stuck in traffic:

The answer to this question is quite beautiful, and it’s one of the things I’m most excited about with DSQL. Because the answer is absolutely not “eventual consistency”.

You see, it’s not just data that’s flowing around the Circle, it’s time too. Every transaction has a start time Tstart. This time comes from EC2 Time Sync, which provides us with microsecond-accurate time. When the QP queries storage, it doesn’t just ask “give me all the rows in the test table”. Instead, it adds “.. as of Tstart”. When the QP writes data to the journal, it computes the commit time and then says “store this data at Tcommit”.

The journal provides an ordered stream of both time and data, which means storage can know precisely when it has all the data to answer the query.

As somebody who’s spent an awful amount of time debugging and trying work around bugs caused by eventual consistency, I really cannot overstate how delighted I am with this design property. In DSQL, you never have to...

Aurora DSQL and the Circle of Life

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models