GitHub - Crawlora-org/crawlora-deadweb: Is a domain genuinely dead, or just blocking your bot? A passive, local, MIT Go CLI + library that classifies domain reachability (alive/redirect/blocked/dead). The open methodology behind the Crawlora Dead-Web Index. · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
Crawlora-org
crawlora-deadweb
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>1 Commit<br>1 Commit
.github/workflows
.github/workflows
classify
classify
.gitignore
.gitignore
.goreleaser.yaml
.goreleaser.yaml
LICENSE
LICENSE
README.md
README.md
go.mod
go.mod
main.go
main.go
View all files
Repository files navigation
crawlora-deadweb
Is a domain genuinely dead — or just blocking your bot? Tell them apart from one passive probe.
crawlora-deadweb is a small, dependency-free CLI (and Go library) that probes a domain and<br>classifies it alive / redirect / blocked / dead , with the reason. It tells a domain that's<br>gone (no DNS, nothing listening) from one that's alive but refusing automated clients<br>(403 / 429 / anti-bot). Most "dead link" checkers conflate the two — and that's exactly the error<br>behind the myth that ~27% of the web is dead. It isn't.
It is a classifier, not an unblocker. It does a DNS lookup, a TCP connect, and one honest<br>GET /, reads the response, and labels it. It never logs in, submits a form, solves a challenge,<br>or tries to defeat anything.
Classification runs locally and open , from the public response. For the measured<br>browser-fingerprint arm — re-probing a blocked domain with a real Chrome TLS/JA3 client across the<br>proxied fleet to see which "blocked" sites are actually reachable — add --browser, which calls<br>Crawlora's hosted engine.
This powers, and is the open companion to, the Dead-Web Index —<br>a reachability census of the top 10 million domains that found ~14% genuinely dead, not the usual<br>27.6% (most "dead" is anti-bot blocking or a served error).
What the labels mean
alive — a usable HTTP response (2xx, or a 4xx/5xx the server answered — a response isn't death).
redirect — ended on an unresolved redirect.
blocked — the host is up but won't serve us: anti-bot / auth / rate-limit, or it accepts a TCP<br>connection but won't complete HTTP (tarpit / strict TLS).
dead — no DNS resolution, a refused/reset connection, or nothing listening. Genuinely gone.
Install
# from source (Go 1.23+)<br>go install github.com/Crawlora-org/crawlora-deadweb@latest
# or clone + build<br>git clone https://github.com/Crawlora-org/crawlora-deadweb<br>cd crawlora-deadweb && go build -o crawlora-deadweb .
Prebuilt Linux / macOS / Windows binaries are published via GitHub Releases.
Usage
[domain...]">crawlora-deadweb [flags] [domain...]
$ crawlora-deadweb grooveshark.com reuters.com<br>grooveshark.com<br>outcome dead<br>reason dns_failed — genuinely unreachable
reuters.com<br>outcome blocked<br>reason forbidden (403) — alive but refusing this client<br>(run with --browser for the measured browser-fingerprint arm)
--json emits NDJSON (one compact object per line) — pipe straight into jq -c or a data pipeline.
Batch / pipelines. Pass many domains as args, or pipe a list on stdin (one per line; blank lines<br>and #-comments ignored). Domains are probed in parallel (--concurrency, default 8):
results.ndjson<br>printf 'grooveshark.com\nexample.com\n' | crawlora-deadweb">cat domains.txt | crawlora-deadweb --json --concurrency 50 > results.ndjson<br>printf 'grooveshark.com\nexample.com\n' | crawlora-deadweb
Each JSON record matches the open dataset schema:<br>domain, tld, outcome, reason, first_status, final_status, final_url, scheme, hops, parked.
The browser arm (optional, hosted)
The local probe is a polite HTTP request from your IP, so "blocked" is an upper bound — a vendor<br>refusing a datacenter client ≠ the site being unreachable. For the measured tier — what actually<br>gets through with a real browser fingerprint and the proxied fleet — add --browser:
export CRAWLORA_API_KEY=... # get one at...