A Theory of Arrays (ToA) Union Find

g0xA52A2A1 pts0 comments

A Theory of Arrays (ToA) Union Find | Hey There Buddo!

One thing I’ve been discussing with Rudi, Michel and Max is union find annotations. Union find enhancements that still feel like union finds.

I think I have another one that supports a notion of destructive store / substitution which doesn’t obviously fall into previous categorizations. Such an edge annotation feels highly noninvertible, but it is ultimately doable because the edge annotation is nondestructively swapping out data with the root annotation during rerooting.

Annotating Union Finds

There are some places in the union find it is possible to put little extra bits of data.

One can annotate the root with data. When you merge two classes, you need some method to merge this data. This binary operation probably should be commutative and perhaps idempotent, which puts it in the ball park of a semigroup / semilattice. In egg, this mechanism is used to store program analyses. Sometimes people use them to store counts in the sets, which are not a semilattice. I kind of liked the “union find dict” way of phrasing it https://www.philipzucker.com/union-find-dict/

Edges can also be annotated. If the annotations form a group, that helps because you need to sometimes invert the annotations / have a null annotation. https://www.philipzucker.com/union-find-groupoid/

I’m neither convinced nor unconvinced requiring edge annotations be groups is the sweet spot.

I have some interesting examples I think where it is not a group, but some light asymmetry can be supported via the annotation forcing the parent / requiring a makeset https://www.philipzucker.com/thin_monus_uf/

As a reminder, here is what a basic offset union find looks like.

from dataclasses import dataclass, field<br>type FatId = tuple[int, int] # offset and id<br>@dataclass<br>class OffsetUF:<br>parents : list[FatId] = field(default_factory=list)<br>def makeset(self) -> FatId:<br>i = len(self.parents)<br>self.parents.append((0,i))<br>return (0,i)<br>def find(self, x : FatId) -> FatId:<br>off, xid = x<br>while True:<br>(offy, yid) = self.parents[xid]<br>if yid == xid:<br>assert offy == 0<br>return (off, xid)<br>off += offy<br>xid = yid<br>def union(self, x : FatId, y : FatId) -> FatId | None:<br>(offx, xid) = self.find(x)<br>(offy, yid) = self.find(y)<br>if xid != yid:<br>z = (offy - offx, yid)<br>self.parents[xid] = z<br>return z<br>else:<br>return None<br>def add(off, x : FatId) -> FatId:<br>return (off + x[0], x[1])<br>uf = OffsetUF()<br>x = uf.makeset()<br>y = uf.makeset()<br>uf.union(add(3, x), add(4, y))<br>uf

OffsetUF(parents=[(1, 1), (0, 1)])

The edge and root annotations can interact. If you have an offset and an even/odd/unknown analysis, of course x = y (+1) and even(x) implies odd(y). If you want to pull the annotation pack to a child from the root, or reroot, you need to know the group action on the annotation. Sometimes an edge annotation might transfer info into the lattice. Rudi is the one who taught me this. See the extended real offset union find below the fold for an example.

Looking back at Ed Kmett’s Guanxi talks https://youtu.be/ISNYPKiE0YU?si=Naz0It2sQwljIY_M https://github.com/ekmett/guanxi/blob/master/src/Relative/Internal.hs , he does talk about transporting his lattice like analysis via the group union find annotations. He also in his monoidal parsing talks refers to using group annotations as a way to lazily shift an entire structure of offsets. I think this is less in the tradition of the group union find and more in the tradition of semi-persistent data structures.

Semi-persistence

Semi-persistent data structures differ from persistent data structures in that you mutate under the hood and maybe the older copies are degraded in performance or capability in some way, but not gone.

In order to backtrack any data structure or state, you need to somehow keep a record of how to undo the moves you’ve done. This record may sometimes be implicitly on your language’s stack, but it is still there. What this amounts to is a chain of update structs linking up to the current mutated version. Instead of sort of being splayed out over all your code, it is possible to capture this up into a data structure that controls the most current mutated version and offers keys to the previous versions.

https://inria.hal.science/hal-04045849v1/document Semi-persistent Data Structures Sylvain Conchon, Jean-Christophe Filliâtre, https://usr.lmf.cnrs.fr/~jcf/publis/puf-wml07.pdf A Persistent Union-Find Data Structure

https://xavierleroy.org/CdF/2022-2023/ Leroy knocks it out of the park again

https://github.com/yaspar-org/semi-persistent Interesting talk at EGRAPHS 2026 about semi persistent tecniques in this egraph and other things

Interestingly, the rerooting technique seems like it goes back to at least a technique for shallow binding in lisp 1.5 https://web.archive.org/web/20191008134617/http://home.pipeline.com/~hbaker1/ShallowArrays.html https://web.archive.org/web/20200212080133/http://home.pipeline.com/~hbaker1/

As always, there is both an arena style and pointer style...

union find data https fatid self

Related Articles