How bot detection works (and why your automation gets blocked)

theanonymousone1 pts0 comments

How bot detection works (and why your automation gets blocked) - Intuned Blog

IntroducingThe New Intuned|Read more →

Dashboard

Engineering | Jun 2, 2026<br>How bot detection works (and why your automation gets blocked)

Omar Bishtawi·12 min read<br>Share

Key takeaways

Bot detection is probabilistic scoring, not a single check. Dozens of weak signals combine into one risk score that maps to an action: pass, challenge, block, or serve degraded data.

It works in layers: IP and network, TLS and HTTP/2 fingerprinting, browser fingerprinting, automation-framework tells, behavioral analysis, and CAPTCHAs.

The strongest signals are contradictions, like a Chrome User-Agent with a Node TLS (JA4) fingerprint, or a CDP-driven browser exposed by Runtime.enable.

Fixing one layer isn't enough. A residential proxy IP paired with a robotic fingerprint or a navigator.webdriver flag still gets caught.

Reliable automation means covering the whole stack, and stay in sync with every browser and tooling update.

You build a scraper. It runs flawlessly on your laptop. You deploy it to a server, and the same code starts getting empty pages, 403s, or CAPTCHAs it never showed you before. Somewhere between your machine and production, a bot detection system started paying attention.

This post walks through how those systems work, layer by layer, from the network connection up to behavioral machine learning. The framing is deliberately from the builder's side: for each layer, what signal the detector reads, and why a legitimate automation often looks robotic enough to get flagged. If you run scrapers, RPA flows, or crawlers against protected sites, this is a map of what you're up against, and where most automations quietly fail.

What is a bot detection system actually trying to do?

A bot detection system separates two categories of traffic, real users and unwanted automation, without blocking so many real users that the business suffers. It does that with probabilistic scoring. Dozens of individually weak signals combine into a single risk score, and that score decides what happens next.

No single signal is decisive. A datacenter IP isn't proof of a bot. An empty plugins array isn't proof either. What a detector does instead is gather signals across every layer below, weight them, and map the result to an action: serve the page normally, run a silent challenge, show a hard CAPTCHA, block the request, or serve degraded data. Hold onto that shape, many signals to one score to one action, because it explains everything that follows.

Layer 1: where is the request coming from?

The cheapest signal to check is the network one. Before any JavaScript runs, a detector already knows your IP address, and it can look up that address's reputation, its network owner, and how its traffic pattern compares to normal browsing.

IP reputation comes from services like MaxMind and IPQualityScore, plus the detection vendors' own databases of addresses tied to proxies, VPNs, Tor exit nodes, and datacenter ranges. Most real browsing doesn't originate inside a cloud provider, so a request from an AWS or GCP range raises the score. That heuristic isn't airtight: privacy services like iCloud Private Relay egress through cloud infrastructure, so real users do sometimes appear on datacenter-adjacent IPs.

ASN classification goes a step further. An Autonomous System Number identifies the organization that owns an IP block, which lets a detector sort traffic into residential ISP, mobile carrier, or hosting company. A headless browser on a cloud VM carries a hosting ASN, and that single fact is enough for many sites to route it differently. It's also the whole reason residential and mobile proxies exist.

Two more network signals matter. Request rate and distribution: humans browse at irregular, organic intervals, while naive automation issues requests on uniform timers or reuses one User-Agent across thousands of addresses. And IP consistency within a session: a session that begins in Berlin and jumps to Singapore five minutes later isn't one person, though VPN switching does generate false positives here.

The catch for builders is that the network layer is the easiest to address, which is exactly why detection doesn't stop at it. Swap in a residential proxy and you've cleared the first hurdle, not the race.

Layer 2: TLS and HTTP/2 fingerprinting

Before the first byte of your HTTP request is even read, your client has revealed a lot in how it negotiates the connection. TLS and HTTP/2 fingerprinting turn the mechanics of that negotiation into an identifier, and it's where HTTP-client-based automation usually gives itself away.

When a browser opens a TLS connection it sends a ClientHello: the cipher suites it supports, the TLS extensions it advertises, the order they appear in, its elliptic-curve preferences, and more. Chrome on macOS produces a near-identical ClientHello across millions of installs. A Python requests call or a Node https request produces...

detection automation layer signals network request

Related Articles