Use your database to power state machines | Lawrence Jones
Use your database to power state machines
Most people are familiar with state machines and know their value. The average<br>state machine library can help you model states, prevent invalid transitions,<br>and produce diagrams that help even non-technical people understand how the code<br>behaves.
This article isn’t about making the case for state machines. It’s about how you<br>take the concept of a state machine and have it work alongside your database<br>models, leveraging your relational database (say Postgres or MySQL) to help you<br>build concurrent-safe and efficient software.
I first encountered this pattern when I joined GoCardless in 2015. Processing<br>bank payments is a multi-day affair and extremely stateful, so it’s no surprise<br>that the team eventually built a library called Statesman that<br>provided a state machine powered by underlying transition tables.
Most of GoCardless’ critical processes use Statesman, and by the time I left we<br>had transition tables with well over 10B rows. It became such an essential tool<br>that I’d argue it was a powerful competitive advantage. Statesman is even<br>reflected externally, such as in the GoCardless public API with endpoints<br>providing rich audit trails.
So if you use Ruby, grab Statesman and get going. But for those who want these<br>benefits but are non-Ruby’ers, you can implement a small library in your<br>language of choice in just a few hours provided you understand the nuances of<br>transition tables, locking, and edge cases.
That’s what I did the other day at incident.io. Here’s a guide so you can do it<br>too.
Transition tables
Let’s begin by saying that in most applications, you want to be capturing every<br>transition that a resource has made through a state machine and storing it for<br>later analysis. In some situations, you may even consider the history of<br>transitions to decide which state to transition to next.
That’s why the first component of any database state machine will be creating a<br>table to contain transitions.
-- Assuming the payments table is already here.<br>create table payments (/* ... */);
create table payment_transitions (<br>id text primary key default generate_ulid() not null,<br>payment_id text not null references payments(id),<br>to_state text not null,<br>most_recent boolean not null,<br>sort_key integer not null,<br>created_at timestamptz not null default now(),<br>updated_at timestamptz not null default now()<br>);
Each transition row states:
The parent resource it belongs to (payment_id)
The state this transition is moving into (to_state)
Whether this transition is the latest (most_recent)
An ordinal to allow for logical ordering of transitions (sort_key)
Now we add two index-backed unique constraints that are going to ensure our<br>state machine’s integrity.
create unique index idx_payment_transitions_by_parent_most_recent<br>on payment_transitions<br>using btree(payment_id, most_recent)<br>where most_recent;
create unique index idx_payment_transitions_by_parent_sort_key<br>on payment_transitions<br>using btree(payment_id, sort_key);
The first ensures we can only ever have a single transition that is<br>most_recent for any payment, a clear requirement if we ever want to sensibly<br>ask “what state is this payment in?”. The second ensures we get no duplicates on<br>sort_key: less important, but useful to ensure all transitions can be strictly<br>ordered.
Expressing the state machine in code
Now we have a table we’ll use to store transitions, we need to build a library<br>that can be used to express the state machine in our codebase.
In the Go library we’re using at incident.io, we start by expressing the<br>transition table as a domain model that our ORM can work with:
type PaymentTransition struct {<br>// ID is the unique ID of this transition<br>ID string `json:"id" gorm:"type:text;primaryKey;default:generate_ulid()"`<br>// PaymentID is a reference to the parent resource<br>PaymentID string `json:"payment_id"`<br>// ToState is where this transition was to<br>ToState PaymentState `json:"to_state"`<br>// MostRecent is true when this transition is the most recent<br>MostRecent bool `json:"most_recent"`<br>// SortKey provides ordinality over transitions<br>SortKey int `json:"sort_key"`<br>// CreatedAt is set upon transition creation<br>CreatedAt time.Time `json:"created_at"`<br>// UpdatedAt is set whenever this transition is modified<br>UpdatedAt time.Time `json:"updated_at"`
func (PaymentTransition) Parent() Payment {<br>return Payment{}
func (a PaymentTransition) State() PaymentState {<br>return a.ToState
// ParentColumn tells our machine library what column refers<br>// to the parent resource.<br>func (PaymentTransition) ParentColumn() (structField, column string) {<br>return "PaymentID", "payment_id"
You’ll notice PaymentState is a new type, which we define as a Go enum value:
type PaymentState string
const (<br>PaymentStatePendingSubmission PaymentState = "pending_submission"<br>PaymentStateSubmitted PaymentState = "submitted"<br>PaymentStatePaid PaymentState = "paid"<br>PaymentStateCancelled...