Key Joins -- Interactive Demo
1 Abstract
A change proposal for key joins: a syntax for equijoins that follow referential constraints. A key join is written with an arrow indicating the direction of the foreign key relationship. The Database Management System (DBMS) verifies at compile time that a matching referential constraint exists and that the join is semantically valid. Key joins work through views, CTEs, and other derived tables, provided that the underlying referential relationship is preserved through the derivation.
2 References
[Foundation]ISO/IEC 9075-2, SQL/Foundation (IWD)<br>[Schemata]ISO/IEC 9075-11, SQL/Schemata (IWD)
3 Discussion
SQL equijoins are expressed as value comparisons: A.x = B.y. This says that two columns share a comparable value, but nothing more. It does not say whether one column references the other, which direction that reference goes, whether the referenced side is unique, or whether the referencing side's values are contained within the referenced side's values.
In practice, many equijoins in application code follow referential constraints. The query author intends to navigate from one table to another along a defined referential path, but the query itself does not express this. The intent lives in naming conventions, comments, or the query author's mental model. Nothing in the query would break if the schema changed in a way that invalidated the assumed relationship.
Key joins make this intent explicit. The query names the columns on both sides and indicates the direction with an arrow. The DBMS verifies at compile time that the join faithfully follows the declared relationship. If the schema changes in a way that invalidates the join, the query fails with a clear error rather than silently producing wrong results.
The example error messages shown below use wording from the PostgreSQL prototype implementation and are illustrative, not normative.
4 Notation
A key join is an equijoin between two tables, the referencing table and the referenced table , where the join predicate compares referencing columns in the referencing table with referenced columns in the referenced table. The two column lists have the same arity and pairwise comparable types.
Like an equijoin, a key join has two input operands, a left and a right; one is the referencing table and the other the referenced table. When key joins are chained in a query, the right operand of each is the newly introduced table for that key join.
Throughout this paper, "FK" abbreviates "referencing" and "PK" abbreviates "referenced" (e.g. "FK side", "PK columns"). The "PK" abbreviation does not imply a primary key; the referenced side may be backed by either a PRIMARY KEY or a UNIQUE constraint.
5 Intention
The intention of the key join syntax is to make a referential equijoin explicit, verifiable, and locally readable. A key join enriches a referencing table with a referenced table by following a declared referential constraint. The referencing rows are preserved: each appears in the result exactly once. Each all-non-null referencing key is enriched with its unique referenced row.
The intent is asymmetric. The referencing table is the table whose rows are guaranteed to survive the join; the referenced table is consulted for a unique matching row but is not itself preserved by the key-join guarantee. The syntax lets the reader identify the referencing and referenced sides, and the compared columns, directly from the query.
The DBMS verifies the required facts at compile time from declarations and query structure. The proof rules are sound rather than complete: a mathematically valid key join may still be rejected when the specified rules cannot derive the necessary facts.
6 Definition
The conditions under which an equijoin is a key join are stated in terms of two multisets:
The referenced multiset is the projection of the referenced table onto the referenced columns, restricted to rows where every referenced column is non-null.
The referencing multiset is the projection of the referencing table onto the referencing columns.
Such an equijoin is a key join if and only if all of the following conditions hold:
Every element of the referenced multiset has a multiplicity of exactly 1 (one).
For every element of the referencing multiset where no value is null, the referenced multiset contains that element.
Either
no element of the referencing multiset contains a null value, or
the referencing table is preserved via an outer join type.
6.1 Why the Definition Captures the Intention
The Intention requires that every row of the referencing table appears in the result exactly once, and that every all-non-null referencing key is enriched with exactly one referenced row.
Condition 1 prevents duplication: If every referenced multiset element has multiplicity one, no referencing row finds more than one match in the equijoin, so no referencing row is duplicated. The same uniqueness supplies the singularity of the...