No query strings here either from ~timo
A couple of weeks ago, Chris Morgan published
I've banned query strings.<br>I read it, liked it and then did roughly the same<br>thing on my own site - with two deliberate differences.
Chris's opening sums up the motivation better than I could:
I don't like people adding tracking stuff to URLs. Still less<br>do I like people adding tracking stuff to my URLs.
[...] UTM parameters are for me to use, not you.<br>Leave my URLs alone.
From chrismorgan.info/no-query-strings
The premise is the same here. A ?utm_source=... or<br>?ref=... tacked onto one of my URLs by some intermediary<br>is, at best, noise I never asked for and at worst a tracker the<br>referrer is using to nudge my visitor's behaviour into a funnel.<br>I'd rather refuse to serve those requests than pretend they're a<br>legitimate way to reach a page on my site.<br>I'm also a fan of having one true canonical<br>URL to all my pages.
Where I differ from Chris
1. cache-busters like ?v= are allowed
Chris went for a true blanket ban - including breaking old<br>cache-busting URLs like ?t=... and ?h=...<br>that his site used to serve.<br>That's likely the right call when none of those<br>URLs are still in circulation. In any case, those might only be used<br>for static assets anyways, where people don't have bookmarks to.
My situation is slightly different: I actively use<br>?v= as a cache buster on assets I serve<br>today. The very HTML you're reading links to<br>main.css?v=1. I use it so that I can set a very high<br>Cache-Control: max-age: ... on static assets.<br>If I matched Chris's strictness I'd<br>have to either give up on query-string cache busting (and switch<br>to fingerprinted filenames or Cache-Control juggling),<br>or break my own page load on every bump.
So my rule is a narrow allowlist:<br>everything is blocked, except ?v= .<br>The matcher is intentionally strict - the whole query string must<br>be exactly v= followed by digits, nothing else, no<br>extra parameters smuggled in alongside.
(no_query_strings) {<br>@bad_query `{http.request.orig_uri}.contains("?") && !{http.request.uri.query}.matches("^v=[0-9]+$")`<br>error @bad_query 403
The first clause uses orig_uri so a bare trailing<br>? still trips the ban - Caddy's {query}<br>placeholder can't distinguish "absent" from "empty", and a lone<br>? deserves the same treatment as a parameter list.<br>The second clause uses the canonical {uri.query}<br>because Caddy doesn't expose .query as a sub-key on<br>orig_uri - the rewrite never touches the query so the<br>two are equivalent here.
2. 403 Forbidden, not 414 URI Too Long
Chris picked<br>414 URI Too Long,<br>and is upfront about it:
You could argue that I'm abusing 414 URI Too Long. I respond<br>that it's funnier this way.
From Chris's ban page
It's indeed nice, but I wanted to pick a status code that I can<br>defend on RFC grounds rather than vibes. Here's how I read<br>RFC 9110<br>and<br>RFC 7725<br>for this case:
400 Bad Request
The<br>server cannot or will not process the request due to something<br>that is perceived to be a client error (e.g., malformed request<br>syntax). The request isn't malformed; ?utm_source=x<br>is perfectly well-formed. Too generic.
403 Forbidden
The<br>server understood the request but refuses to authorize it.<br>That is exactly what's happening: I understood the request, the<br>URL would otherwise resolve, and I am refusing on policy grounds.<br>The spec also explicitly notes that a server<br>can<br>describe that reason in the response content, which is what<br>the body of the 403 page does.
404 Not Found
Misleading. The resource exists; I just won't serve it via<br>this URL. Also has unpleasant SEO and caching side effects.
414 URI Too Long
A refusal to service the request because the request-target<br>is<br>longer than the server is willing to interpret. The<br>objection is about length, not policy or content.<br>Although, I agree that one could argue that everything after<br>the canonical URL is too long.
451 Unavailable For Legal Reasons
Not legal reasons. Just personal taste.
403 is the cleanest semantic match. I'm not 100% certain, but<br>I'd interpret the "authorize" from RFC9110 as not only HTTP authentication<br>or authorization, but rather the more general sense of "permit".
(Okay, Chris is right that 414 is funnier, though. I'll concede that<br>one.)
What it looks like
When a request comes in with anything other than ?v=,<br>Caddy short-circuits with a 403 and serves a small explainer page.<br>You can try it yourself:<br>furrer.life/~timo/?utm_source=this-post.<br>The page tells you what happened, why, and offers the same URL<br>without the query string.
If you want to follow Chris down this path, his post links the<br>relevant Caddyfile snippet on his site or<br>use mine above which is a small variant of that with<br>the extra allowlist clause shown above.
Webmentions
Replies (1)