Your Python Scraper Has a Tell. Curl-Cffi Is How You Hide It

theanonymousone1 pts0 comments

Your Python Scraper Has a Tell. curl-cffi Is How You Hide It. | by Farbod Khorramvatan | May, 2026 | ITNEXTSitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Your Python Scraper Has a Tell. curl-cffi Is How You Hide It.

The invisible fingerprint every requests call leaves behind — and here’s how you can fix it.

Farbod Khorramvatan

8 min read·<br>17 hours ago

Listen

Share

You spent three hours crafting the perfect request. Rotating User-Agents. Real browser headers copied straight from DevTools. Residential proxies on top.<br>And the site still hands you a 403. Or worse — a 200 with a Cloudflare challenge page that your code happily parses as "success."<br>So you do what most scrapers eventually do. You reach for Selenium. You spin up a headless browser. You watch your memory climb past a gigabyte to fetch a JSON endpoint that should have taken 200 milliseconds.<br>Here’s the uncomfortable truth I had to swallow: the site wasn’t blocking my headers. It was blocking me before my headers were even read.<br>The handshake snitches on you!<br>Before a single byte of HTTPS leaves your machine, your TLS client sends a thing called a ClientHello . It’s basically a polite introduction:<br>Hi, I support these TLS versions, here are my cipher suites in this specific order, here are the extensions I’d like to negotiate, and here are some elliptic curves I’m cool with.

Here’s the catch: every TLS client does this introduction differently.<br>Chrome sends a particular list of extensions in a particular order, with particular GREASE values sprinkled in. Firefox sends a different list. Safari, different again. Python’s requests uses OpenSSL through the ssl module, and the list OpenSSL sends has been sitting in anti-bot databases since roughly 2017.<br>Hash that list with a known algorithm and you get a JA3 fingerprint (or its newer cousin, JA4). The hash for requests is a famous constant. The hash for Chrome 131 is a different famous constant. Cloudflare reads yours, compares it to your User-Agent — "you claim to be Chrome but your handshake screams Python urllib3, lol" — and quietly returns a 403 before your application code does anything at all.<br>You can rotate proxies until the heat death of the universe. The handshake will still snitch on you.<br>This is also exactly why curl from your terminal often works when requests fails. Different TLS stack, different fingerprint, less of an obvious tell.<br>And it’s not just TLS, JA3 is the famous part. But modern anti-bot stacks check at least two more layers, and you should know about them because they’re going to come up:<br>HTTP/2 fingerprint. Once TLS is up, HTTP/2 has its own little handshake — SETTINGS frames, WINDOW_UPDATE, the pseudo-header order (yes, :method vs :authority vs :scheme vs :path — the order matters). Akamai's bot product was the first to popularize fingerprinting this layer. Different clients, different signatures.<br>Header order. This one trips people up. HTTP/1.1 doesn’t define a canonical header order, but browsers always send headers the same way. Chrome’s order is consistent across sessions. requests has its own order. If your TLS is perfect but you send User-Agent before Accept when Chrome sends it after — that alone can flag you.<br>So three signals: TLS, HTTP/2, header order. All three are leaking from requests. All three get fixed by magical curl-cffi library!<br>Enter curl-cffi: The Library That Finally Made My Scrapers Quiet<br>curl-cffi is a Python binding for curl-impersonate — a patched fork of curl that mimics, byte for byte, the TLS and HTTP/2 fingerprints of real browsers.<br>A simple example of using the library:<br>from curl_cffi import requests

response = requests.get("https://example.com", impersonate="chrome")That’s it. One impersonate keyword argument. Your handshake is now indistinguishable from a real Chrome browser!<br>That impersonate="chrome" parameter is doing a lot of work. Behind it:<br>The TLS ClientHello gets rewritten to match Chrome’s exact extension list, order, and GREASE pattern.<br>The HTTP/2 settings frames, pseudo-header order, and window sizes match Chrome’s.<br>The default headers go out in Chrome’s order.<br>The JA3 hash you produce matches a real Chrome’s JA3 hash, byte for byte.<br>If you are more curious:<br>from curl_cffi import requests

response = requests.get("https://tls.browserleaks.com/json", impersonate="chrome")<br>print(response.json())<br># ja3n_hash (the GREASE-normalized JA3) will match a real Chrome browserThe requests interface from curl_cffi is API-compatible on purpose. r.status_code, r.json(), r.headers, r.cookies — they all work the way you expect.<br>Don’t take my word for it<br>Run this yourself. It takes 30 seconds.<br>import requests<br>import curl_cffi

URL = "https://tls.browserleaks.com/json"

def fingerprint(label, data):<br>print(f"\n{label}")<br>print(f" ja3_hash:...

chrome requests order curl from different

Related Articles