Lifting E-Graphs | Hey There Buddo!
I submitted a talk to the EGRAPHS workshop and it was accepted! https://pldi26.sigplan.org/details/egraphs-2026-papers/13/Lifting-E-Graphs-A-Function-Isn-t-a-Constant
Between the time when I submitted my abstract and now, while the meat of “there is something here” has not changed, my understanding and best explanation of the thing has been refined.
The whole design is powered by intuition coming from a semantic model manipulating R^n -> R functions in a systematic way. I’m now calling what I previously called a Thinning e-graph https://www.philipzucker.com/thin_egraph/ a Lifting E-Graph, as the word lift is more evocative of what the main operation of interest does on R^n -> R functions.
What’s Wrong with Explicit Names?
There are 3 issues with explicit names
Generative processes can run off the rails (eqsat). We need fresh names sometimes. As an example P --> forall x, P | where x fresh is basically a valid rewrite for propositions. Sometimes rewrites like this can be useful. We might derive the same thing many many times redundantly with different names if we just gensym them. We can play skolem games and free variable analysis games to try and derive names we know are fresh and repeatable, but this is (subjectively) inelegant.
Missed sharing. f(g(h(x))) shares no storage with f(g(h(y))). The amount of missed storage opportunity gets worse the deeper and bigger the term. These two things aren’t equal, so I’m not exactly complaining about missed equality. I’m complaining about a missed relationship that can be used for reasoning and compaction.
Too much sharing. This is the surprising one. I think non careful use of explicit names conflates actually disequal objects. The notation sin(x) actually can refer to distinct mathematical objects that one should at least be aware of the possibility to disambiguate.
The more surprising issue I think is #3. Let’s expand upon that.
Original Sin: X
sin(x) from an ordinary perspective is fine. We’ve lived with the notation for hundreds of years. It works. We know what it means when we see it.
From another perspective it is horribly vague and possibly collapsing distinct entities. We are somehow referring vaguely to the function sin, but are we specifically referring to x |-> sin(x) : R -> R
or to x,y |-> sin(x) : R^2 -> R?
According to a simply typed perspective of the world, these are different things, and in fact can’t even be compared for equality due to type mismatch. The graphs are completely different images. Yet, the notation that suppresses the context x,y |-> or leaves it implicit conflates these two things. It is at least worrying that perhaps in some subtle way this conflation is in essence asserting R^2 = R^1 and from thence chaos ensues.
From this observation come a slogan for today’s design philosophy:
Context is not where a term is, it is part of what a term is
This is not the case for every conception of the word “context”, but it is what I want to do today.
We will choose to not leave out the information of x,y |-> ever. It is a part of the thing we are discussing. For today, we shall consider it basically incoherent to talk about a “bare” sin(x)
Naive Nameless
There is a naive way to achieve this design philosophy using an ordinary egraph / ordinary first order terms.
We can make a different copy of every function symbol for every dimension/context we might be working in and we can refer to variables by (dimension,index) pairs $x_{di}$ rather than by names. For example $x_{10}$ (the zeroth variables in context of size 1) is what previously I would have called x |-> x, $x_{21}$ (the first variable in context of size 2) is what previously I would have called x,y |-> y.
Likewise, we could also disambiguate all the sin into different versions $\sin_d$ depending on the type of it’s argument. If $x_{21}$ has type $R^2 \rightarrow R$ then if sin is going to accept it, it needs to take in arguments of that type. We have $sin_0 : (R^0 \rightarrow R) \rightarrow (R^0 \rightarrow R)$, $sin_1 : (R \rightarrow R) \rightarrow (R \rightarrow R)$, $sin_2 : (R^2 \rightarrow R) \rightarrow (R^2 \rightarrow R)$ and so on.
Really all of these come from the pointwise application of the regular sin function, and this is a parametric polymorphic construction, so this disambiguation is not really that necessary (the index n is derivable from the dimension of sin’s arguments). Still, if we wanted to stay conceptually in a simply typed framework, this is what we’ve go to do.
Ok, great. This carefulness does solve issue #3 of too much sharing. At the same time, we’ve made point #2 of too little sharing both better and worse.
Because we are being nameless by referring to variables by integers, x |-> f(g(h(x))) becomes syntactically the same as y |-> f(g(h(y))) since both become f1(g1(h1(x10))). So sharing has become better in that sense.
On the other hand, now f1(g1(h1(x10))) : R -> R and f2(g2(h2(x20))) : R^2 -> R...