Aurora DSQL and the Circle of Life

croottree1 pts0 comments

Aurora DSQL and the Circle of Life

Aurora DSQL and the Circle of Life

In this article we&rsquo;re going to take a deeper look at the Circle of Life in<br>Aurora DSQL. Understanding the flow of data will really help you wrap your head<br>around the DSQL architecture, and how best to build applications against DSQL.

My intention in sharing this article is help you understand the flow. There<br>are many other things to understand: availability, scalability, durability,<br>security, and so on. I won&rsquo;t be discussing those in detail, because each of<br>those topics is deep and complex, and deserves its own focus.

The flow of data

Aurora DSQL is based on PostgreSQL. The Query Processor (QP) component is a<br>running postgres process, although we&rsquo;ve significantly modified it to work with<br>our architecture.

If you connect to DSQL with a Postgres client (such as psql), you&rsquo;re connected<br>to one of these postgres processes. You&rsquo;re connected to a QP, and you can start<br>to interact with it as you would with any other Postgres server.

If you run a local Postgres operation, such as SELECT 1, then that query is<br>processed entirely locally by the QP. But what happens when you query a table:

select * from test;<br>id | value<br>----+-------<br>1 | 10<br>(1 row)

Usually, you&rsquo;d expect Postgres to read from storage locally, which might mean<br>reading from the buffer cache, or doing disk I/O. When running on Aurora<br>Postgres (APG), cache misses would result in a load from remote storage.

Like APG, reads in DSQL also go to remote storage. In our above query, which<br>is a scan of the entire test table, the QP is going to turn around and scan<br>storage, and storage is going to return all the rows in the table.

But how did storage get the rows in the first place?

insert into test values (1, 10); -- autocommit

Vanilla Postgres would process that transaction locally, inserting into the<br>Write-Ahead Log (WAL), updating the buffer cache, and using fsync() to persist<br>the changes to disk. In APG, the buffer cache is also updated, but the<br>durability of the transaction is ensured by fsync() to the remote storage in<br>multiple Availability Zones (AZ).

Commits in DSQL have the same basic ingredients, but they&rsquo;re expressed quite<br>differently. In DSQL, data is durably persisted when it&rsquo;s written to the<br>journal1. Storage follows the journal, and keeps itself up to date.

When I first started working on DSQL (many years ago!), I didn&rsquo;t really get<br>this flow. I&rsquo;d been told &ldquo;writes go to the journal, reads go to storage&rdquo;. I<br>nodded, but I didn&rsquo;t deeply, truly, understand that simple explanation. I&rsquo;d<br>spent too much time with traditional architectures, and my mind kept falling<br>back on the familiar.

What helped me get it was the picture at the top of this post. Imagine somebody<br>drawing this on the whiteboard. They draw the three boxes: QP, journal and<br>storage. Then, they draw the Circle of Life:

There&rsquo;s something about this presentation, vs. the one at the top of the<br>article, that helped it go click for me. Removing the service interactions<br>certainly helps. Notice how in the first picture there&rsquo;s an arrow from the QP<br>to storage, while in the Circle, the arrow is the other way round?

&ldquo;Writes go to the journal, reads go to storage&rdquo;, never quite did it for me.<br>&ldquo;Writes never go to storage, reads never go to the journal&rdquo; also didn&rsquo;t quite<br>the message across. The Circle did for me, and I hope it does for you.

The flow of time

Now you may be thinking: How do we know that storage is up to date? We&rsquo;ve just<br>inserted our (1, 100) tuple and got a successful commit. Then, we run our<br>table scan. What if storage isn&rsquo;t up to date? What if there&rsquo;s some kind of<br>delay on the network, preventing storage from learning about the new row?

The change is trying to reach storage, but it&rsquo;s stuck in traffic:

The answer to this question is quite beautiful, and it&rsquo;s one of the things I&rsquo;m<br>most excited about with DSQL. Because the answer is absolutely not &ldquo;eventual<br>consistency&rdquo;.

You see, it&rsquo;s not just data that&rsquo;s flowing around the Circle, it&rsquo;s time too.<br>Every transaction has a start time Tstart. This time comes from EC2 Time<br>Sync, which provides us with microsecond-accurate time. When the QP queries<br>storage, it doesn&rsquo;t just ask &ldquo;give me all the rows in the test table&rdquo;.<br>Instead, it adds &ldquo;.. as of Tstart&rdquo;. When the QP writes data to the journal,<br>it computes the commit time and then says &ldquo;store this data at Tcommit&rdquo;.

The journal provides an ordered stream of both time and data, which means<br>storage can know precisely when it has all the data to answer the query.

As somebody who&rsquo;s spent an awful amount of time debugging and trying work around<br>bugs caused by eventual consistency, I really cannot overstate how delighted I<br>am with this design property. In DSQL, you never have to...

rsquo storage dsql time circle data

Related Articles