LadybugDB Flying Solo

Ladybug Flying Solo Ladybug Flying Solo Ladybug Team Developers at LadybugDB

Jul 1, 2026 benchmarks

With the release of 0.18.0, LadybugDB is making a mark in the world of embedded graph databases. While it’s built on top of an excellent technology base provided by the KuzuDB project, we argue here that it’s more than a Kuzu fork with innovations that should be evaluated on their own merit.

Our objective is to convince you that you shouldn’t really be using the archived Kuzu project or its lightly modified forks for security reasons. When it comes to competing graph databases, make your choice based on merits and not aggressive marketing tactics, SEO or sales pitches.

Kuzu was first released in 2022 and had not one, but two VLDB papers. It took the database community by storm by innovating in a number of dimensions including a practical, strongly typed version of Cypher, factorized join algorithms and excellent performance.

When Apple acquired the key people on the project, there was a flurry of activity to claim the excellent reputation built up by the project and its leaders. There were forks ranging from one line README update to a few bug fixes and a single feature fork. We are intentionally ignoring any private/closed-source forks which may or may not exist.

A number of competing databases have responded to the news by publishing migration guides and in some cases used inbound marketing techniques also known as SEO.

Why you shouldn’t use Kuzu

We’re big fans of the erstwhile Kuzu team here and very thankful for their contributions. What is being discussed here is the simple reality of living in the age of Mythos. Kuzu team made many good technical calls/judgements. But..

Any large code base has bugs. Kuzu had about 100k lines of code and 3x as many lines of third party code. This is a giant risk for a C++ project.

This risk exists not just for Kuzu, but all the lightly modified forks.

Why should you use Ladybug

We’ve fixed most known/reported bugs. That’s not to say that new bugs didn’t come in. The risk is always there. But we’re here fixing them as they get reported, while Kuzu stays archived.

The rest of the blog post is about recent innovations.

Billion Scale Graphs

With Kuzu and its design objective of having the fastest join algorithm for deep graph traversals, you simply couldn’t import a billion scale graph due to space amplification. Import a 1GB parquet into Kuzu and it’d consume between 10-20GB depending on the distribution of keys and configuration.

With this release, we can demo a 10GB zstd compressed database that you can query with less than 8GB of RAM to get any triples you want. After decompressing it consumes 20GB on disk. This demo was simply not practical with kuzu due to design choices.

lbug> call disk_size_info() return *; ┌────────────┬───────────────────────────────────────────┬───────────┬─────────────┐ │ category │ name │ num_pages │ size_bytes │ │ STRING │ STRING │ UINT64 │ UINT64 │ ├────────────┼───────────────────────────────────────────┼───────────┼─────────────┤ │ header │ database_header │ 1 │ 4096 │ │ catalog │ catalog │ 1 │ 4096 │ │ metadata │ metadata │ 783 │ 3207168 │ │ node_table │ wikidata_node │ 1440241 │ 5899227136 │ │ index │ wikidata_node.name_index:tree │ 1275258 │ 5223456768 │ │ node_table │ edge_meta │ 90 │ 368640 │ │ index │ edge_meta._PK:hash_index_headers │ 2 │ 8192 │ │ index │ edge_meta._PK:disk_array_headers │ 3 │ 12288 │ │ index │ edge_meta._PK:primary_slots │ 512 │ 2097152 │ │ node_table │ edge_types │ 0 │ 0 │ │ rel_table │ wikidata_rel:wikidata_node->wikidata_node │ 2140331 │ 8766795776 │ │ free_space │ free_pages │ 679577 │ 2783547392 │ │ total │ file_total │ 5536799 │ 22678728704 │ └────────────┴───────────────────────────────────────────┴───────────┴─────────────┘ Choice of Indexing

Very few databases, especially columnar databases use hash indexing by default. But Kuzu chose to do so because it was a key part of the Accumulate-Semijoin-Probe (ASP) join. This involved a trade-off: space amplification and range queries. Hash indexes are great for probes, but do not maintain ordering. Also Kuzu didn’t implement indexing on columns other than the primary key.

Further the hash indexing was so deeply ingrained in the kuzu code base that it took us two release cycles to make it optional. So here we are!

You can now disable default hash indexing and choose Adaptive Radix Tree (ART) indexing as a second choice. Some queries are faster, some are slower. You choose the type of index based on the workload, like all normal databases do.

We implemented UDFs disk_size_info() and show_indexes() to display the space consumed by indexes.

One of the reasons why kuzu was outperformed by DuckDB and LanceDB on simpler queries that didn’t involve ASP joins was related to this.

Stable Storage Format

Kuzu’s storage format was not stable. Each release you had to export/import databases to be able to use new features. We have since implemented a...

LadybugDB Flying Solo

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level