Show HN: Crawlora-deadweb – tell if a domain is dead or just blocking your bot

GitHub - Crawlora-org/crawlora-deadweb: Is a domain genuinely dead, or just blocking your bot? A passive, local, MIT Go CLI + library that classifies domain reachability (alive/redirect/blocked/dead). The open methodology behind the Crawlora Dead-Web Index. · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Crawlora-org

crawlora-deadweb

Public

Notifications You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 1 Commit 1 Commit

.github/workflows

classify

.gitignore

.goreleaser.yaml

LICENSE

README.md

go.mod

main.go

View all files

Repository files navigation

crawlora-deadweb

Is a domain genuinely dead — or just blocking your bot? Tell them apart from one passive probe.

crawlora-deadweb is a small, dependency-free CLI (and Go library) that probes a domain and classifies it alive / redirect / blocked / dead , with the reason. It tells a domain that's gone (no DNS, nothing listening) from one that's alive but refusing automated clients (403 / 429 / anti-bot). Most "dead link" checkers conflate the two — and that's exactly the error behind the myth that ~27% of the web is dead. It isn't.

It is a classifier, not an unblocker. It does a DNS lookup, a TCP connect, and one honest GET /, reads the response, and labels it. It never logs in, submits a form, solves a challenge, or tries to defeat anything.

Classification runs locally and open , from the public response. For the measured browser-fingerprint arm — re-probing a blocked domain with a real Chrome TLS/JA3 client across the proxied fleet to see which "blocked" sites are actually reachable — add --browser, which calls Crawlora's hosted engine.

This powers, and is the open companion to, the Dead-Web Index — a reachability census of the top 10 million domains that found ~14% genuinely dead, not the usual 27.6% (most "dead" is anti-bot blocking or a served error).

What the labels mean

alive — a usable HTTP response (2xx, or a 4xx/5xx the server answered — a response isn't death).

redirect — ended on an unresolved redirect.

blocked — the host is up but won't serve us: anti-bot / auth / rate-limit, or it accepts a TCP connection but won't complete HTTP (tarpit / strict TLS).

dead — no DNS resolution, a refused/reset connection, or nothing listening. Genuinely gone.

Install

# from source (Go 1.23+) go install github.com/Crawlora-org/crawlora-deadweb@latest

# or clone + build git clone https://github.com/Crawlora-org/crawlora-deadweb cd crawlora-deadweb && go build -o crawlora-deadweb .

Prebuilt Linux / macOS / Windows binaries are published via GitHub Releases.

Usage

[domain...]">crawlora-deadweb [flags] [domain...]

$ crawlora-deadweb grooveshark.com reuters.com grooveshark.com outcome dead reason dns_failed — genuinely unreachable

reuters.com outcome blocked reason forbidden (403) — alive but refusing this client (run with --browser for the measured browser-fingerprint arm)

--json emits NDJSON (one compact object per line) — pipe straight into jq -c or a data pipeline.

Batch / pipelines. Pass many domains as args, or pipe a list on stdin (one per line; blank lines and #-comments ignored). Domains are probed in parallel (--concurrency, default 8):

results.ndjson printf 'grooveshark.com\nexample.com\n' | crawlora-deadweb">cat domains.txt | crawlora-deadweb --json --concurrency 50 > results.ndjson printf 'grooveshark.com\nexample.com\n' | crawlora-deadweb

Each JSON record matches the open dataset schema: domain, tld, outcome, reason, first_status, final_status, final_url, scheme, hops, parked.

The browser arm (optional, hosted)

The local probe is a polite HTTP request from your IP, so "blocked" is an upper bound — a vendor refusing a datacenter client ≠ the site being unreachable. For the measured tier — what actually gets through with a real browser fingerprint and the proxied fleet — add --browser:

export CRAWLORA_API_KEY=... # get one at...

Show HN: Crawlora-deadweb – tell if a domain is dead or just blocking your bot

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi