Meta's Crawler Ate My 2TB Bandwidth

rodneyosodo1 pts1 comments

Meta's Crawler Ate My 2TB Bandwidth

Meta's Crawler Ate My 2TB Bandwidth<br>Jun 22, 2026<br>I have a home server that I use to host some of my applications. The past two months I've been getting FUP warnings from Safaricom, throttling my speed to 8 Mbps. My subscription gives me 60 Mbps. At first I thought Safaricom was measuring wrong, I shrugged it off.

Second month, same thing. Now I'm pissed.

I hooked OpenCode to my server to investigate. The FUP cut hit Friday evening, but power outages kept cycling my machine so I couldn't get clean metrics. Once it stayed up, the data was staggering.

1.82 TB in 7 days. 93 GB down, 1.73 TB up.

My home lab runs Proxmox → a Debian VM → Docker with ~30 services. In Proxmox I could see the outgoing traffic bar was maxed. Something was scraping the hell out of my services.

I dug through docker logs and found the culprit: gitea , my self-hosted git instance. A Meta/Facebook crawler (AS32934, 2a03:2880::/32) was continuously scraping every repo, commit by commit, project by project.

Before blocking: 1,200 KB/s steady upload , the crawler was pulling entire repo histories 24/7.<br>After blocking AS32934 with a Cloudflare WAF rule: 2-3 KB/s , a 99.75% drop.

All my repos are public. I want them public. But scraping 1.73 TB of git data in a week is not browsing, it's downloading entire organizations.

Bot Fight Mode didn't stop it. AI Security didn't stop it. Meta's crawler is verified, so Cloudflare's bot protections let it through. Only a brute-force WAF Custom Rule, AS Num = 32934 on git.rodneyosodo.com, killed it.

So now I'm running VictoriaMetrics with 90-day retention, Grafana with bandwidth dashboards and FUP alerts, and a node_exporter watching every byte on eth0. I'll know the moment something sniffs my bandwidth again.

What I did

I enabled Bot fight mode

I added a custom rule to block Meta's crawler

"action": "block",<br>"description": "AI Crawl Control - Block AI bots by User Agent",<br>"enabled": true,<br>"expression": "(http.request.uri.path ne \"/robots.txt\" and (http.user_agent contains \"meta-externalagent\"))",<br>"id": "9e1344f5e3a5429a8645957301cb9f48",<br>"last_updated": "2026-06-20T12:07:06.61738Z",<br>"ref": "[CF AI Audit]",<br>"version": "1",<br>"position": {<br>"index": 1

"action": "block",<br>"description": "Block Meta crawler",<br>"enabled": true,<br>"expression": "(http.host eq \"git.rodneyosodo.com\" and ip.src.asnum eq 32934)",<br>"id": "75f0f1802f634739a2f12e360720c75c",<br>"last_updated": "2026-06-20T12:10:19.923341Z",<br>"ref": "75f0f1802f634739a2f12e360720c75c",<br>"version": "1",<br>"position": {<br>"after": "9e1344f5e3a5429a8645957301cb9f48"

These are the results so far.

Get new posts in your inbox. No spam, just the occasional deep dive.<br>Email<br>Subscribe

meta crawler bandwidth block scraping rule

Related Articles