One IP, Six Crawler Identities, One Second (Detection via Nginx Logs)

william18722 pts0 comments

One IP, Six Crawler Identities, One Second: Detection Built Against Real Production Logs | SpeyTech

Operational Security<br>One IP, Six Crawler Identities, One Second: Detection Built Against Real Production Logs<br>How three production patches in 24 hours closed two leaks that synthetic testing missed<br>Published<br>May 17, 2026

Reading Time<br>16 min

Rotational bot-identity spoofing is an attacker pattern where one source IP issues requests under multiple named-bot user agents within a short time window. On 16 May 2026 the axilog.io access log captured the pattern cleanly: source IP 5.255.104.83 issuing thirteen requests in one second under six distinct named-bot identities — ClaudeBot, GPTBot, PerplexityBot, YandexBot, Baiduspider, and bingbot — across /api/env, /actuator/env, /api/config, /config.json, /secrets.json, /appsettings.json, and the canonical-form variants of the same paths with trailing slashes.<br>No published crawler operates under multiple identities from a single egress IP. The combination is unambiguous. This article describes the pattern, the four-part discriminator used to detect it, and the three-patch sequence in which the v1.10.0 detector shipped, then revealed two false-positive failure modes against the production log within the same day, then closed both in v1.10.1 and v1.10.2.<br>The empirical claims are traceable: the nginx access log line excerpt is reproducible from any host running the attack, the analyser is open source under AGPL-3.0-or-later, and every fix described below corresponds to a tagged release with its own test suite.<br>By William Murray , Founder of SpeyTech — deterministic computing for safety-critical systems. Inverness, Scottish Highlands.<br>Definition: Rotational bot-identity spoofing Rotational bot-identity spoofing is an attacker pattern where a single source IP issues requests under multiple distinct named-bot user agents within a short time window.

What the production log captured<br>The pattern was caught in raw form on the axilog.io seo log on 16 May 2026. The lines below are real production capture, abbreviated only in the user-agent column where the recognisable bot identifier is sufficient. Timestamps, IP, paths, and status codes are intact:<br>5.255.104.83 [16/May/2026:15:54:14] "GET /" 200 "ClaudeBot/1.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /robots.txt" 200 "YandexBot/3.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /config.json" 404 "YandexBot/3.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /actuator/env" 301 "PerplexityBot/1.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /api/env" 301 "ClaudeBot/1.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /appsettings.json" 404 "YandexBot/3.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /api/config" 301 "Baiduspider/2.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /sitemap.xml" 301 "GPTBot/1.3..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /secrets.json" 404 "bingbot/2.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /api/env/" 404 "ClaudeBot/1.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /actuator/env/" 404 "PerplexityBot/1.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /api/config/" 404 "Baiduspider/2.0..."<br>5.255.104.83 [16/May/2026:15:54:14] "GET /sitemap-index.xml" 200 "GPTBot/1.3..."Thirteen requests, one source IP, one second, six distinct named-bot identities. Four 301 redirects — three from nginx normalising /api/env, /actuator/env, and /api/config onto trailing-slash canonical forms that then 404, plus one content-route redirect from /sitemap.xml to /sitemap-index.xml that the attacker followed to a 200 on row 13. Six 404 probe failures. Three 2xx exploratory fetches. The 2xx requests under multiple identities serve a different purpose to the 404 probes: they fingerprint what an unknown server returns to each named bot before the targeted probes begin.<br>A reader running the analyser against this log will see the rotation section report five identities, not six. The path-class gate excludes the two GPTBot rows, because /sitemap.xml and /sitemap-index.xml are not in ROTATION_TRIGGER_PATHS and are not in the security-probe regex set. This is the gate doing its job: the same source IP issuing legitimate sitemap fetches under a verifiable public crawler stays attributed to GPTBot in the report, while its rotation-probe traffic under five other identities gets reclassified as SuspectedBotIdentityRotation. The discriminator is the combination of conditions, not any single one of them.<br>The attacker IP 5.255.104.83 is already publicly known through its probe campaigns against thousands of hosts. The paths probed are commodity exploit fingerprints. Nothing about either is sensitive. What matters is the combination visible in the log: one IP, multiple trusted-crawler identities, security-sensitive paths, sub-minute window. That combination is the signal.<br>Why naive detection approaches do not catch this<br>Three obvious detection approaches each fail in their own way. An IP blocklist is reactive and lags the threat — the same...

identities sitemap crawler production source config

Related Articles