Welcome to ORDER BY jungle | boringSQL
LIVE
boringSQL Live Session #2<br>17:30 CEST · 15:30 UTC · 11:30 AM ET<br>· Vote for your topic
Reserve your spot →
Table of Contents
Names and expressions are not the same
Same answer, two different sorts
Above the identifier, or around it
Aliases that aren't the names you think
GROUP BY checks the opposite scope first
Window ORDER BY does not even pretend
UNION ORDER BY is name-only
The seam, in the source
A useful workaround
The boring takeaway
SQL is fun and not at all boring. The latest article by Markus Winand on Order by Has Come a Long Way sent me on quite a journey.
First, set up a table called nums with one integer column and four rows:
CREATE TABLE nums (a int);<br>INSERT INTO nums VALUES (0), (1), (2), (3);<br>Try to guess what these two queries return.
SELECT -a AS a FROM nums ORDER BY a;<br>SELECT -a AS a FROM nums ORDER BY -a;<br>Most of us would guess the same rows in a different order. The actual answer is that they produce exactly the same rows in exactly the same order . By the same logic you might expect
SELECT a AS c FROM nums ORDER BY -c;<br>to do exactly the same. Except it does not. It errors with column "c" does not exist despite the alias being right there in the statement. Welcome to ORDER BY jungle.
Names and expressions are not the same
If you ask most developers how ORDER BY works, they will say "you put a column name there and it sorts the rows". In 99% of queries that is exactly what happens. People sort by created_at or id and move on.
Strictly speaking, three, if you count ORDER BY 1. Positional references are their own can of worms and out of scope for this post.<br>But `ORDER BY` accepts two different kinds of things:<br>SELECT created_at, user_id FROM events ORDER BY created_at;<br>SELECT created_at, user_id FROM events ORDER BY date(created_at);<br>Both feel natural. And the thing nobody tells you is that they go down completely different code paths in the parser. Different scope rules, different lookups, different error messages. The first looks at your SELECT list. The second looks at your FROM clause. They never look at the same place.
Same answer, two different sorts
Look at the first query again.
SELECT -a AS a FROM nums ORDER BY a;<br>You wrote ORDER BY a. A bare identifier, no decoration. Postgres goes down the name path. It scans the SELECT list for something called a, finds the aliased column -a AS a, and sorts by its output values. The negated values are -3, -2, -1, 0, ascending is -3, -2, -1, 0. That is what comes out.
Now the twin.
SELECT -a AS a FROM nums ORDER BY -a;<br>You wrote ORDER BY -a. This is no longer an identifier. It's an expression: unary minus around a column reference. The parser does not even try the same logic.
Instead it switches to the expression path , where the only a it knows is the column in nums, and sorts the input values negated. And by arithmetic luck, the two queries land on the same row order. Same output, completely different logic. If you don't believe it is just luck, drop the negation from the SELECT list and keep it in ORDER BY:
SELECT a AS c FROM nums ORDER BY -a; c<br>(4 rows)<br>ORDER BY -a is an expression, so it sorts by -input_a ascending, which is input_a descending. The alias c was never consulted. The result has nothing to do with whatever c happens to be.
And ORDER BY -c is now obvious. -c is an expression, so the parser looks for column c in FROM, doesn't find it, and errors. The alias exists, but in a scope this code path cannot see.
Above the identifier, or around it
Once the rule is clear (bare identifier hits the SELECT list, anything else hits the table) the rest of the surprises fall out.
SELECT 'hello' AS x FROM nums ORDER BY x::text;<br>-- ERROR: column "x" does not exist<br>It is probably not surprising that casts count as expressions and push the lookup to the table.
The surprise might come with
SELECT a AS c FROM nums ORDER BY c DESC NULLS FIRST;<br>Which will work as expected. Both DESC and NULLS FIRST are part of the sort clause itself, not of the sort expression. They sit above the identifier in the parse tree, so they never touch it. The parser still sees a bare c, takes the fast path, finds the alias, sorts by it, and then applies "descending, nulls first" on top of the resolved key.
The same cannot be said about collation .
SELECT 'A'::text AS x FROM nums ORDER BY x COLLATE "C";<br>-- ERROR: column "x" does not exist<br>This is a really bad one. COLLATE might look the same as a sort modifier, but it is not. It wraps the expression in the parse tree.
Parentheses are a special case.
SELECT -a AS a FROM nums ORDER BY (a);<br>-- works, sorts by alias<br>Postgres collapses redundant parens before the bare-identifier check, so (a) is still bare a. The seam is asymmetric in the way that maximises confusion: COLLATE is "still a name to a human, an expression to the parser", and (a) is "an expression to a human, still a name to the parser". You get both flavours of wrong intuition mixed...