Database as a Graph for Relational Deep Learning

Your Database Schema Is a Graph rimas silkaitis

Your Database Schema Is a Graph<br>Jun 4, 2026

My default when building a feature query is to write it in one shot. I want to do all the joins, all the aggregates, and arrive at the output I need in a single pass. That’s the intention. Then something is off. The row counts are wrong or an aggregate is getting inflated by a join I didn’t account for. So I start unwinding it. In Postgres that means CTEs: WITH base AS (...), WITH joined AS (...), WITH aggregated AS (...). One checkpoint at a time, stepping through the relational algebra until I find where things went sideways. It works. It’s also exactly like scattering console.log statements through a codebase: manual, sequential, tells you where the break is but nothing about how to avoid being here next time. And you will be back next time, because when the schema changes or the distribution shifts, you’re iterating the whole thing again.

That loop is the villain in Fey et al.’s 2023 paper, “Relational Deep Learning: Graph Representation Learning on Relational Databases”. The paper’s argument is that we have been solving the wrong problem. Instead of getting better at writing aggregation queries, we should stop writing them. The Graph Neural Network does it instead.

How I think about feature engineering on relational data

Feature engineering on a relational schema is mostly a translation problem. You have a fact table recording events over time and a dimension table holding context about the entities involved. My mental model is still rooted in classic data warehousing (STAR schemas anyone?). Then let’s say you want to predict something about one of those entities at some future point. To get there, you have to flatten the event history into a single row per entity: counts, averages, maximums, windowed aggregates. Each column in that flat table represents a decision you made about what might be predictive.

The decisions compound. Do you count all reviews or just recent ones? Do you weight by recency? What time window is relevant for the outcome you’re predicting? Every one of those choices is a hypothesis encoded as SQL. And when you get the model results back and something doesn’t work, you’re not sure whether the model is wrong or whether one of those hypotheses was wrong.

The place where this bites me hardest is in the null cases. When I flatten a schema that has nullable FK relationships, I have to make a deliberate choice: zero? NULL? A flag? Whatever I pick, I’m potentially hiding signal. A customer with no purchases in the prediction window is different from a customer with three purchases. A user with a null on an optional demographic field is telling me something. But the flat table can only see what I explicitly decided to encode. What gets quietly dropped matters more than what goes in.

What the paper does

The bet: that relational databases should be treated as graphs, and that graph neural networks should learn directly from that graph structure, replacing the manual join-and-aggregate step that’s used as inputs to predict a value relative to the entities in your system.

The paper’s framing of what’s broken:

“The core problem is that no machine learning method is capable of learning on multiple tables interconnected by primary-foreign key relations. Current methods can only learn from a single table, so the data must first be manually joined and aggregated into a single training table, the process known as feature engineering. Feature engineering is slow, error prone and leads to suboptimal models.”

Cvitkovic demonstrated in earlier work that GNNs could work on relational tables, and Featuretools has been automating the aggregation layer since around 2017. The paper’s more defensible point is that no prior approach combined three things: a rigorous benchmark for reproducible comparison, a correct treatment of temporal data, and an end-to-end trainable pipeline.

The two-level graph abstraction

The paper defines two layers of graph structure. The first, the schema graph, is your ER diagram. One node per table, one edge per FK relationship. You already have this mental model. Nothing new here except a name.

The second is the Relational Entity Graph (REG). This is where it gets interesting. The REG instantiates your ER diagram at row level: every row in every table becomes a node, and every FK reference between two specific rows becomes an edge. Your customer row is a node. The review that customer wrote is a separate node. The FK from that review to the customer is one edge. The FK from that review to the product is a second edge. The REG has as many edges as there are FK references in your actual data, not in your schema.

Your ER diagram is the blueprint. The REG is the building. The GNN runs on the building.

The nodes carry the column values from their source rows as features. Node types are tracked separately (a customer node and a product node get different weight matrices). Edge...

Database as a Graph for Relational Deep Learning

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy