Ruminating about mutable value semantics
Ruminating about mutable value semantics
Published 2024-06-03
(This is part of a series on the design of zest. See the list of posts here.)
I have two goals for zest that are in tension:
Be a reasonably efficient imperative language (eg go, julia).
Treat values as data (eg erlang, clojure, see the shape of data).
In particular, for goal 2 I want to guarantee that deserialize(serialize(x)) == x .
It's tricky to satisfy both goals, because as soon as you allow mutable references you can create circular data-structures and have to deal with questions of identity vs value. Javascript, for example, doesn't satisfy goal 2 even though it has pervasive serialization in the form of json:
> let x = []<br>undefined<br>> JSON.parse(JSON.stringify(x)) == x<br>false
> let y = new Date()<br>undefined<br>> typeof(JSON.parse(JSON.stringify(y))) == typeof(y)<br>false
The only approach I know of to square this circle is mutable value semantics. The core idea is to allow mutation but prevent mutable references from being observed or aliased. The result looks like this:
var xs = list[string]['a','b','c']<br>let ys = xs // ys is an independent value, not an alias of x!<br>append(xs@, ys)<br>print(length(xs)) // prints 6<br>print(length(ys)) // prints 3
Enforcing these aliasing rules is the crux and there are many options for how to do it. Let's look at three in particular: implicit copies, dynamic alias tracking, and static alias tracking.
implicit copies
The simplest option is to always require a copy when moving into or out of a mutable value:
var xs = list[string]['a','b','c']<br>let ys = xs // deep copy xs<br>append(xs@, ys)<br>print(length(xs)) // deep copy xs?!<br>print(length(ys)) // deep copy ys?!
This actually works pretty well when we're using a mutable value mutably, but we really don't want that last copy for length(xs)!
dynamic alias tracking
The solution used in the toy language swiftlet in the original paper is to track aliasing dynamically by reference-counting large values likes lists. Where we would have copied xs before, now we just need to increment it's refcount. But now every time we mutate a value we have to check the refcount and maybe make a defensive copy.
var xs = list[string]['a','b','c']<br>let ys = xs // increment the refcount of xs<br>append(xs@, ys) // shallow copy xs and increment the refcount of the strings<br>print(length(xs)) // increment the refcount of xs<br>print(length(ys)) // increment the refcount of ys
Refcounting makes writing code very easy but also:
Makes it easy to accidentally copy a large value when you meant to mutate it in place.
Requires sophisticated optimizations to remove the performance overhead of modifying refcounts all the time.
static alias tracking
The authors of the swiftlet paper are currently working on a systems language called hylo. Swiftlet's implicit, unpredictable allocation is unappealing in a systems language so hylo instead tracks aliasing statically. All values have move semantics and the language is extended with immutable references as well as mutable references. Using the same syntax as the previous examples, this would look like:
var xs = list[string]['a','b','c']<br>let ys = deep-copy(xs) // removing this copy produces a compile error<br>append(xs@, ys)<br>print(length(xs&)) // pass immutable reference to xs<br>print(length(ys&)) // compile error - ys was consumed by append - write append(xs@, deep-copy(ys)) instead
This allows hylo to have mutable value semantics while completely avoiding implicit copies and refcounting overhead, but at the cost of an additional burden on the programmer to annotate behaviour, and also increasing the number of programs which won't type-check.
I'm expecting expressiveness to follow a similar trajectory to rust. Early versions of rust were often frustratingly inexpressive but over time language extensions, libraries and idioms evolved to fill most of the gaps without sacrificing the core reasoning guarantees.
One of the gap-fillers hylo is experimenting with is remote parts, which introduce a limited form of references to the type system. This doesn't break the reasoning guarantees that hylo cares about - you still can't observe aliasing and deallocation remains deterministic. But it is very hard to square with my goal of transparent serialization - if I deserialize a value containing a remote part, what is the lifetime of that part and who is responsible for freeing it?
So programs probably become a bit more annoying to write, and at least one of the amerliorations is a feature that I will find difficult to copy.
alternatives?
I spent the last month filling notebooks, trying to figure out the contours of the design space. Some possible contours:
If references are ever observable then transparent serialization becomes hard. They have to remain totally 2nd class.
Moves metastasize. As soon as I add move semantics to anything I also need borrows and I end up in the rust/hylo valley.
Borrows don't metastatize....