Webmaster It's time to serve slop to AI crawlers

speckx2 pts0 comments

iczelia :: Webmaster! It's time to serve slop to AI crawlers.λ μ " 05.27.2026 &middot; 479 words &middot; 3 min read &middot; opinioncsobf<br>Webmaster! It's time to serve slop to AI crawlers.

Hosting a website in 2026 is a huge pain in the ass. Not only you have to update your whole system every other day because someone found more RCEs in nginx, the AI labs are also looking to kill your website through a million of tiny cuts. This week alone, Meta's and Anthropic's bots have caused insurmountable waves of completely pointless requests that hammer static endpoints over and over. Most bots also don't respect robots.txt.<br>My hobby for messing with crawlers started in 2021 when a deluge of WordPress vulnerability scanners were hitting my non-existent /wp-admin/ non-stop. The solution that I opted in for was blocking these two paths in robots.txt. Of course, such scanners are typically ran by people who are not the type to follow clear and simple instructions of how the internet is supposed to work in a way that everyone coexists with everyone else, so they kept hammering that endpoint even more. Eventually, I decided that a cute idea would be to serve gzip bombs to the unwanted visitors. Try it out for yourself, https://iczelia.net/.env via Python3 requests will take a long time and decompress to a monstrous file full of zeroes.<br>AI has revolutionised the world, including new annoyances and high RoI trolling techniques. The vulnerability scanners have mostly calmed down, while AI labs pointlessly hammer every endpoint they can to fuel their slop macines with more questionably sourced data. Of course, serving them gzip bombs or serving them a 403 would be a possiblity, but there is an even funnier option. Since these bots are looking to scrape data, why not serve them some data? But not just any data, let's serve them some slop. Reportedly in some cases this can hurt the training of the models significantly, and it would be hilarious to see the crawlers fall into a Library of Babel-type pit that they just can't get out of because every page links some other random nonsense for the fun of it.<br>LLM inference is quite simple. If your model is small enough it can be done via a couple of hundred of lines of code. I personally chose arnir0's Tiny LLM (10M parameters) as my slop factory of choice. Then, I wired a handler that dispatches based on the UA and serves the crawlers stuff like this:<br>% curl -A "DotBot/1.2 (+https://opensiteexplorer.org/dotbot)" http://iczelia.net/blog/thisdoesntexist<br>lang="en">> charset="utf-8"> name="viewport" content="width=device-width, initial-scale=1">>Thisdoesntexist> name="description" content="Thisdoesntexist">><br>body{max-width:62ch;margin:2.5em auto;padding:0 1em;font:16px/1.55 Georgia,serif;color:#222}<br>h1{font-size:1.6em;margin:0 0 .6em}<br>p{margin:1em 0}<br>nav{margin-top:2.5em;padding-top:1em;border-top:1px solid #ccc;font-size:.95em}<br>nav h2{font-size:1em;margin:0 0 .4em;color:#666;font-weight:normal}<br>nav ul{margin:0;padding-left:1.2em}<br>>>>>>Thisdoesntexist>>the L. V. Vandervips We can only make reached with one that can come up with the were there with the whole and so, that is the case, this might be true fight.>>The reason I am looking for different. This is a different sort of one. For example, you choose only the tall-safety for all the other thanks to the stubbed is not the right b.>>You don't get me baby that the rest of the other is a long much like a in a very large number of a drywilts down the the shit to be which is not a the place; then it makes the gold and the the most perfect. It's time. We're still the best.There is a little. This is a in the world all the time, and being. I'm a guy, what the people.>>>>Related>>> href="/blog/whole-that">Whole and that>>> href="/journal/other-true">Other and true>>> href="/notes/sort-this">Sort and this>>> href="/tag/only-vandervips">Only and vandervips>>> href="/archive/there-looking">There and looking>>>>>>% Soon after deployment, I saw the following in my logs:<br>May 27 16:09:41 iczelia[1200661]: [req] 10.88.0.1 "GET /journal/made-beach" 200 1ms<br>May 27 16:09:42 iczelia[1200661]: [req] 10.88.0.1 "GET /archive/consuming-your" 200 1590ms (handler=1590ms cache=1ms)<br>May 27 16:09:42 iczelia[1200661]: [req] 10.88.0.1 "GET /guestbook/" 200 10ms<br>May 27 16:09:43 iczelia[1200661]: [req] 10.88.0.1 "GET /guestbook/" 200 8ms<br>May 27 16:09:43 iczelia[1200661]: [req] 10.88.0.1 "GET /blog/look-when" 200 1105ms (handler=1104ms cache=1ms)<br>May 27 16:09:44 iczelia[1200661]: [req] 10.88.0.1 "GET /journal/world-will" 200 1099ms (handler=1098ms cache=0ms)<br>May 27 16:09:47 iczelia[1200661]: [req] 10.88.0.1 "GET /journal/will-resume" 200 1119ms (handler=1118ms cache=0ms)<br>May 27 16:09:48 iczelia[1200661]: [req] 10.88.0.1 "GET /archive/could-appreciate" 200 1128ms (handler=1128ms cache=1ms)<br>May 27 16:09:50 iczelia[1200661]: [req] 10.88.0.1 "GET /tag/nugget-time" 200 1101ms (handler=1100ms cache=1ms)<br>May 27 16:09:51 iczelia[1200661]: [req] 10.88.0.1 "GET...

iczelia time handler serve slop crawlers

Related Articles