Blocking an ASN (or similar) from my sites - Matthew Somerville
Blocking an ASN (or similar) from my sites<br>24th May 2026
I run a<br>number<br>of<br>websites,<br>some of which could even be said to be popular.<br>I want humans to visit these websites, enjoy these websites,<br>make their change of trains at New Street in good time,<br>investigate<br>miscarriages of justice,<br>or find out the play they watched in London when hitch-hiking round Europe in the ’80s.
But I find it harder to do this when my server is swamped with<br>artificial traffic from bots, AI clankers, and whatever else nonsense there is nowadays<br>(Weird Gloop has a good summary; I<br>don’t have wikis but for other reasons am basically in the same boat).<br>This is especially egregious when they are incredibly poorly written and constantly<br>fetch basically identical pages which may cause issues not just on my own server but<br>with upstream sources.<br>Clearly the people behind all this simply do not think or care (and as an aside, this is<br>why I cannot extricate positives from this technology without the negatives<br>alongside, and how it has been/is being introduced).
Some you can block by individual IP address, some by user agent, some by location, and sometimes<br>(the purpose of this post) you just feel like every hit you get from an entire company<br>is artificial traffic ultimately derived from selfish individuals;<br>reporting the abuse won’t do anything, so you just want to block any traffic<br>from that company. (This doesn’t help with residential proxies, of course, but<br>every little helps.)
So, how do I block any Amazon, or Tencent, or DigitalOcean, or ..., IP address from accessing my site?
Getting a list
Cloud providers
Google and Amazon publish JSON of their current cloud IP ranges.<br>So for getting a list of ranges from Google you can use:
curl -s -O https://www.gstatic.com/ipranges/cloud.json<br>jq -r '.prefixes | .[] | (.ipv4Prefix // .ipv6Prefix)' cloud.json
Or for Amazon:
curl -s -O https://ip-ranges.amazonaws.com/ip-ranges.json<br>jq -r '.prefixes | .[].ip_prefix' ip-ranges.json<br>jq -r '.ipv6_prefixes | .[].ipv6_prefix' ip-ranges.json
ASN lists
Other providers don’t publish such lists, but they do have to tell the internet which IP addresses they<br>are responsible for and that they provide routing for them. This is done using<br>Autonomous System Numbers,<br>which are used in BGP (Border Gateway Protocol) routing.
A routing registry, such as RADb, lets you look up all the routes<br>given an ASN. So once you have discovered that, I dunno, AS136907 is Huawei, you can ask RADb for all the ranges:
whois -h whois.radb.net -- "-i origin AS136907" | grep 'route:' | cut -d' ' -f 11
Checking it twice
Now you have some lists, you can then add these to your firewall however you<br>wish. In my case, I use iptables, and stick most in a total block list, and<br>some in an incoming drop list (so I can still make outgoing connections).
Having done the above, though, reloading my firewall was now pretty slow; fine,<br>but annoying if I wanted to quickly block something else. A couple of ASNs had<br>an awful lot of IP ranges in them, and I wondered if I could consolidate these<br>at all.
Some searching found me two consolidators,<br>one in Python and<br>one in Rust.<br>Both cut down my list of ranges to block substantially, which was great;<br>the Python one was very slow and heavy on memory, and<br>the Rust one was very quick but I didn’t really want to install rust etc on my<br>server.
Yak shaving into cross-compilation
I had heard rust could cross-compile binaries on one platform to run on another,<br>which seemed ideal – some more searching found me<br>these helpful instructions<br>which worked perfectly for me (and boil down to 1. add x86_64-unknown-linux-gnu target; 2. install provided linker; 3. build).
This gave me a cid-aggregator binary that I could pipe my IP ranges to,<br>before adding to my firewall. 30,000 ranges reduced to 730 odd in the end.
Finding out who’s naughty or nice
Having such wide blocks isn’t without its own issues, even if it has cut quite a bit of bot traffic.<br>The two main problems I have had since are:
Let’s Encrypt, which provides the SSL certificates for my domains, uses random<br>IP addresses at multiple providers (which it won’t reveal) to perform HTTP validation of domains. I<br>have at least once blocked my own renewals due to this; I’ll try and switch to DNS validation<br>at some point, but in the meantime I can temporarily drop the blocks while the<br>renewals take place;<br>Bluesky similarly uses Amazon servers for checking for custom domain<br>handles, so they occasionally break, with a similar solution or switching to DNS<br>verification at some point in future.
Reading
If you’ve made it this far, I have just finished reading<br>The God Engines by John Scalzi, am currently reading The Tiger and the Wolf by Adrian Tchaikovsky,<br>and have recently bought Soviet Scientific Institutes by Eric Lusito,<br>and a book of time travel romance stories called...