Show HN: IResearch – C++ search that beat Lucene and Tantivy on their benchmark

gnusi1 pts0 comments

serenedb/libs/iresearch at main · serenedb/serenedb · GitHub

//files/disambiguate" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

//files/disambiguate;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

serenedb

serenedb

Public

Notifications<br>You must be signed in to change notification settings

Fork<br>34

Star<br>495

FilesExpand file tree

main

/iresearch<br>Copy path

Directory actions

More options<br>More options

Directory actions

More options<br>More options

Latest commit

History<br>History<br>History

main

/iresearch

Top

Folders and files<br>NameNameLast commit message<br>Last commit date<br>parent directory<br>..<br>examples

examples

include/iresearch

include/iresearch

CMakeLists.txt

CMakeLists.txt

LICENSE.md

LICENSE.md

README.md

README.md

View all files

README.md<br>Outline

IResearch is a high-performance C++ search engine library. It's up to 5x faster than Lucene and Tantivy, runs without a JVM and has powered production search since 2018.

Quickstart

git clone --recursive https://github.com/serenedb/serenedb<br>cd serenedb<br>cmake --preset lldb<br>cmake --build build --target iresearch-example-basic<br>./build/iresearch/examples/iresearch-example-basic

To depend on iresearch from your own CMake project, vendor SereneDB as a submodule and link against the iresearch-static target:

add_subdirectory(third_party/serenedb)<br>target_link_libraries(my_app PRIVATE iresearch-static)

Features

Full-text search. Phrase, boolean, prefix, wildcard, fuzzy (Levenshtein), n-gram, regex, range.

Pluggable scoring. BM25, TFIDF, LM-Dirichlet, DFI built-in; custom scorers supported.

Vectorized scoring. Block-at-a-time SIMD pipeline over posting lists; up to 5x throughput vs scalar loops.

Lazy evaluation. Non-lead iterators in conjunctions, exclusions and two-phase queries defer work.

Columnar storage. Modern column store with adaptive compression.

Vector search. Approximate nearest-neighbor (HNSW).

Geospatial. S2-based intersects/contains.

NLP pipeline. Tokenizers, stemmers, stopwords, synonyms (Solr/WordNet), language-aware analysis, pluggable custom analyzers.

Performance

IResearch is up to 5x faster than Lucene and Tantivy on the Tantivy team's Search Benchmark, The Game. The Tantivy maintainers validated and merged iresearch's results themselves.

Detailed per-query breakdown

Benchmark overview

Where the speed comes from

The win is a result of specific optimizations. We wrote them up as a five-post technical retrospective called Search Optimization Journey:

Collecting top-K candidates

Block scoring

Norm gathering

Lazy two-phase queries

Adaptive posting list format

Examples

See the examples directory for complete programs covering the public API, all built and exercised in CI on every PR so they stay in sync:

basic.cpp -- index documents, run term / phrase / boolean / prefix / fuzzy / top-K BM25 queries, read stored fields, delete documents, consolidate.

text_filters.cpp -- phrase search, n-gram similarity matching, regular expressions, SQL-style wildcard patterns (% and _) and fuzzy term matching with a configurable edit distance, all shown side-by-side against the same small corpus.

geo.cpp -- geospatial search over a GeoJSON-indexed point corpus: find everything within a radius of a center point, everything inside an annulus and everything that falls inside an arbitrary polygon.

Production history

IResearch has been in continuous production use since 2018, first as the search engine behind ArangoSearch. Since 2024 it is the search foundation of search-OLAP database SereneDB.

License

Apache 2.0. See LICENSE.md.

Copyright (c) 2024-2026 SereneDB

Copyright (c) 2017-2023 ArangoDB GmbH

Copyright (c) 2016-2017 EMC Corporation

Licensing information for third-party components is in LICENSES.md.

You can’t perform that action at this time.

search iresearch serenedb tantivy examples files

Related Articles