Welcome to ORDER BY Jungle

Welcome to ORDER BY jungle | boringSQL

LIVE

boringSQL Live Session #2 17:30 CEST · 15:30 UTC · 11:30 AM ET · Vote for your topic

Reserve your spot →

Table of Contents

Names and expressions are not the same

Same answer, two different sorts

Above the identifier, or around it

Aliases that aren't the names you think

GROUP BY checks the opposite scope first

Window ORDER BY does not even pretend

UNION ORDER BY is name-only

The seam, in the source

A useful workaround

The boring takeaway

SQL is fun and not at all boring. The latest article by Markus Winand on Order by Has Come a Long Way sent me on quite a journey.

First, set up a table called nums with one integer column and four rows:

CREATE TABLE nums (a int); INSERT INTO nums VALUES (0), (1), (2), (3); Try to guess what these two queries return.

SELECT -a AS a FROM nums ORDER BY a; SELECT -a AS a FROM nums ORDER BY -a; Most of us would guess the same rows in a different order. The actual answer is that they produce exactly the same rows in exactly the same order . By the same logic you might expect

SELECT a AS c FROM nums ORDER BY -c; to do exactly the same. Except it does not. It errors with column "c" does not exist despite the alias being right there in the statement. Welcome to ORDER BY jungle.

Names and expressions are not the same

If you ask most developers how ORDER BY works, they will say "you put a column name there and it sorts the rows". In 99% of queries that is exactly what happens. People sort by created_at or id and move on.

Strictly speaking, three, if you count ORDER BY 1. Positional references are their own can of worms and out of scope for this post. But `ORDER BY` accepts two different kinds of things: SELECT created_at, user_id FROM events ORDER BY created_at; SELECT created_at, user_id FROM events ORDER BY date(created_at); Both feel natural. And the thing nobody tells you is that they go down completely different code paths in the parser. Different scope rules, different lookups, different error messages. The first looks at your SELECT list. The second looks at your FROM clause. They never look at the same place.

Same answer, two different sorts

Look at the first query again.

SELECT -a AS a FROM nums ORDER BY a; You wrote ORDER BY a. A bare identifier, no decoration. Postgres goes down the name path. It scans the SELECT list for something called a, finds the aliased column -a AS a, and sorts by its output values. The negated values are -3, -2, -1, 0, ascending is -3, -2, -1, 0. That is what comes out.

Now the twin.

SELECT -a AS a FROM nums ORDER BY -a; You wrote ORDER BY -a. This is no longer an identifier. It's an expression: unary minus around a column reference. The parser does not even try the same logic.

Instead it switches to the expression path , where the only a it knows is the column in nums, and sorts the input values negated. And by arithmetic luck, the two queries land on the same row order. Same output, completely different logic. If you don't believe it is just luck, drop the negation from the SELECT list and keep it in ORDER BY:

SELECT a AS c FROM nums ORDER BY -a; c (4 rows) ORDER BY -a is an expression, so it sorts by -input_a ascending, which is input_a descending. The alias c was never consulted. The result has nothing to do with whatever c happens to be.

And ORDER BY -c is now obvious. -c is an expression, so the parser looks for column c in FROM, doesn't find it, and errors. The alias exists, but in a scope this code path cannot see.

Above the identifier, or around it

Once the rule is clear (bare identifier hits the SELECT list, anything else hits the table) the rest of the surprises fall out.

SELECT 'hello' AS x FROM nums ORDER BY x::text; -- ERROR: column "x" does not exist It is probably not surprising that casts count as expressions and push the lookup to the table.

The surprise might come with

SELECT a AS c FROM nums ORDER BY c DESC NULLS FIRST; Which will work as expected. Both DESC and NULLS FIRST are part of the sort clause itself, not of the sort expression. They sit above the identifier in the parse tree, so they never touch it. The parser still sees a bare c, takes the fast path, finds the alias, sorts by it, and then applies "descending, nulls first" on top of the resolved key.

The same cannot be said about collation .

SELECT 'A'::text AS x FROM nums ORDER BY x COLLATE "C"; -- ERROR: column "x" does not exist This is a really bad one. COLLATE might look the same as a sort modifier, but it is not. It wraps the expression in the parse tree.

Parentheses are a special case.

SELECT -a AS a FROM nums ORDER BY (a); -- works, sorts by alias Postgres collapses redundant parens before the bare-identifier check, so (a) is still bare a. The seam is asymmetric in the way that maximises confusion: COLLATE is "still a name to a human, an expression to the parser", and (a) is "an expression to a human, still a name to the parser". You get both flavours of wrong intuition mixed...

Welcome to ORDER BY Jungle

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast