The Grammar of Data: Define Once, Run Anywhere with Cross-Engine Expressions

The Grammar of Data: Define Once, Run Anywhere with Cross-Engine Expressions – Xorq

./waitlist

## BLOG The Grammar of Data: Define Once, Run Anywhere with Cross-Engine Expressions

By Simon Späti (guest) | June 30, 2026

← ALL POSTS

Cross-Engine Expressions The Grammar of Data

batting Noun

.filter(yr==2015) Verb

.aggregate(hits) Verb

.into_backend(pg) Modifier

Express. Verify. Run.

Grammars for languages or any other field are a beautiful thing. They compress complex systems into a language with a couple of rules. For the spoken language example, we know when to capitalize a letter or how to start a sentence. There are clear rules. Grammars also help us remember, as we do not need to recall every little rule, but apply them in a structured way.

For text editing, we have Vim motions that help us navigate a text document with 1000s of shortcuts, but because there is a grammar, we do not need to remember them all, but learn the structure of the grammar and combine them. But what if you work in data? What if we could have the same for data, a grammar for data engineering, or a language that defines it?

Expressing our needs declaratively and decisively? Also, expressing it in a way that leads to reproducible outcomes, or works with multiple parts and execution engines already out there. This is what we will discuss in this article. How existing tooling, such as Ibis, provides some capabilities, and how xorq extends them by adding full lineage and transparency for humans, with included executable memory for useful tabular data, all manifested in a single git repository.

Expressions for Data Engineering Workloads

Having a grammar for data engineering means we can express the workloads in a declarative manner, and then be sure we can deterministically reproduce and apply that exact definition.

It’s similar to the concept of a Declarative Data Stack I introduced a while back, but it gives the stack not only configurations but also a language with in-built manifestation and execution engines.

In the above image, we see: 1. How to express (write) our transformations and business logic. It’s the context of every ML or DE pipeline. 2. We can build the expression into a manifest that has a unique hash, runs input validations, tracks lineage, creates a deterministic cache, and produces a human-readable expr.yaml you can diff and review in a PR. 3. Lastly, we can execute it in any execution engine with the same manifest.

This is hugely powerful and separates the concerns of defining logic, verification in the manifest step, and execution as a composable data stack, as Wes McKinney called it, with multi-compute engine possibilities.

How the DE Language Works: Different Expression Types

Every grammar starts with nouns, and here the noun is the source , a node that holds data but carries no transformation yet. It might be an in-memory table, a registered connection to a warehouse, or just a lazy pointer to a file on disk that hasn’t been read. They’re simply referenced, the way a noun refers to a thing before any verb acts on it.

The verbs in our language are transforms such as filter, select, mutate, aggregate, join, order, limit. Each one takes a source (or another transformed expression) and returns a new, immutable expression. You do not mutate anything before it, only describe what should happen next.

Looking at a definition such as .filter(...).aggregate(...).mutate(...), we can see this as a sentence. The moment a verb is applied, the expression stops being a plain noun and becomes a statement, a description of “data plus what should happen to it.” But the sentence isn’t spoken yet, it stays inert, fully composed but unexecuted, until something finally asks it to run. That’s the deferred part of the grammar: writing the sentence and saying it out loud are two different acts.

There’s a third part of speech worth naming: the template . Instead of writing a sentence about a specific noun, you can write one about a noun’s shape, a schema with no rows behind it. A template says “given something with a column of this type, here is what I’ll do to it,” and only later gets bound to an actual source, at which point the placeholder resolves and it becomes an ordinary statement again.

And we have modifiers that ride alongside a statement without changing what it computes. They’re small tags of metadata that say “this expression also represents a fitted model” or “this is a saved reference to something else.” It’s like a footnote with additional metadata that doesn’t change the surface meaning, but adds context for later use.

This analogy makes the grammar compose the same way regardless of which engine eventually executes it. There are more parts, but with just these four, noun, verb, template, modifier, you can read (and write) arbitrarily complex data pipelines the same way learning a handful of verb-and-object combinations in a text editor lets you compose arbitrarily complex edits.

Avoids building...

The Grammar of Data: Define Once, Run Anywhere with Cross-Engine Expressions

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Italy's Meloni says Trump 'made up' story that she 'begged' him for photo at G7