Show HN: I built a production-ready web crawler in Rust with TTL and anti-dupe

qmay-rust1 pts0 comments

GitHub - AICrox2025/SuperCrawl: Open-source web crawler in Rust 路 GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

AICrox2025

SuperCrawl

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>3 Commits<br>3 Commits

src

src

.gitignore

.gitignore

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

README.md

README.md

index.html

index.html

View all files

Repository files navigation

SuperCrawl 馃殌

A high-performance, production-ready web crawler written in Rust. Designed for maximum index purity and efficiency.

Key Features

Zero-spam indexing: Intelligent filtering of and tags.

Anti-Dupe System: Utilizes SHA-256 URL hashing for automatic content overwriting in OpenSearch.

TTL (Time-To-Live): Automatic re-indexing of pages after a set duration (default is 7 days).

Pause/Resume: Ability to pause and resume data collection without losing queue state.

Asynchronous Performance: Built on tokio and axum for lightning-fast request processing.

Installation

Clone the repository:

git clone [https://github.com/AICrox2025/SuperCrawl.git](https://github.com/AICrox2025/SuperCrawl.git)

Configuration:<br>Configure your OpenSearch credentials in main.rs or Cargo.toml.

Run the project:

cargo run --release

Built with Rust for speed and safety.

About

Open-source web crawler in Rust

Resources

Readme

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

stars

Watchers

watching

Forks

forks

Report repository

Releases

No releases published

Packages

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

HTML<br>54.5%

Rust<br>45.5%

You can鈥檛 perform that action at this time.

rust reload cargo supercrawl search crawler

Related Articles