Structured Primary Keys

DeutschEnglish [ DE ]

News › 2026-06-24 › Structured Primary Keys Sometimes I hit a wall when optimizing a client’s query. Over the years, I realized that most cases fall into two categories: (1) unreasonable expectations based on the cloud bill, not on the infrastructure’s power; (2) the primary key design. This article explains how primary keys can put tables into walled gardens so that the database schema falls apart into disconnected pieces, effectively disabling SQL’s set-based powers. I also suggest an alternative approach to primary key design and discuss its pros and cons. I use the term Structured Primary Keys to put emphasis on how it differs from natural and surrogate keys. Contents: Customers Order Lines Isn’t that Denormalization? Space-Efficiency Per-Parent Sequences Cyclic Consistency Bottom Line Customers I’ll will use a small schema for a simple web shop in this article. Let’s start with the core of the customers table: CREATE TABLE customers ( name VARCHAR NOT NULL, email VARCHAR NOT NULL, /* more attributes */

/* TODO: PRIMARY KEY */ )I kept it short as this article is not about the business attributes.0 The two shown columns are just there to be able to ask the essential question: What is a good primary key for this table? At first sight, it is the natural vs. surrogate key discussion. But as this is only a side aspect of this article, I’ll keep it short: The table, as shown, has no column that can be considered for inclusion in a primary key.1 The reason is that both columns are governed by externally defined semantics. That means we do not know the rules they follow. In particular, we do not know if, or when they are unique. Well, we know that names are not unique, so we can rule the name column out. Although less obvious, the same is true for email addresses and more generally everything of which the rules of uniqueness are externally defined. Note that I don’t use the “primary key values should be immutable” argument. Neither do I say that it is a bad argument. When a primary key value changes, it can be difficult to apply this change to the database as it might affect many rows in many tables.2 Further, the old primary key value might have left traces outside the database (APIs, log files, print outs, etc.) so that changing the primary key value renders those traces meaningless. While both arguments are correct, they are both irrelevant from the perceptive of the relational theory. I prefer an even stronger argument that is strictly required for a relational model to work. This argument is that the immutability of the uniqueness rules is required. Note the emphasis: immutable uniqueness rules are strictly required, not immutable values! Let’s take email addresses as example. The problem is that the uniqueness semantics of email addresses depends on the target mail server. In particular, the part before the “@” should be case-sensitive, but often isn’t, subaddressing (“+” addressing) may or may not be supported, dots might be ignored. Let that sink in. While a.b@gmail.com and ab@gmail.com denote the same mailbox, they might refer to different mailboxes at other domains. Externally defined uniqueness semantics are a problem. We might not fully understand them, and they might change in the future. Such misunderstandings and changes would break the uniqueness of our primary keys. Therefore, we must not use such values in our primary keys. So far, the story is in line with “the always use a surrogate key” mantra, but the story goes on. The remaining article is thus based on the following definition of the customers table: CREATE TABLE customers ( name VARCHAR NOT NULL, email VARCHAR NOT NULL, /* more attributes */

id BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY, PRIMARY KEY (id) )Orders The second table in the shop is for orders: CREATE TABLE orders ( customer_id BIGINT NOT NULL, FOREIGN KEY (customer_id) REFERENCES customers (id),

placed TIMESTAMP(6) NOT NULL, /* more attributes */

/* TODO: PRIMARY KEY */ )Again, I just show the relevant parts. It starts with the reference to the customer, including the foreign key definition. The placed column follows to store time of order placement. Obviously, this table would have many more column, but they are irrelevant for the important question: What is a good primary key for this table? The situation is slightly different as before. One might argue that the combination of (customer_id, placed) is worth considering. And I actually agree. While the uniqueness semantics of time is also externally defined—by our universe, in confusing ways—, we might consider timestamps to be well enough understood and immutable in context of a web shop. Timestamps of past events even have the nice-to-have property of immutable values, backed by the universe itself. If the timestamp has a sufficiently high resolution, we could also consider it unique. The consequence of a primary key that combines the customer_id and placed columns is...

Structured Primary Keys

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Britain Became as Poor as Mississippi