Keybench Analysis with TidesDB v9.3.6 and RocksDB v11.1.1 | TidesDB<br>Skip to content
Keybench Analysis with TidesDB v9.3.6 and RocksDB v11.1.1<br>Print
by Alex Gaetano Padula
published on June 9th, 2026
In this article I’ll be going over the first public run with a new tool, so I will spend as much time on how the numbers were produced as on the numbers themselves. If you only read one section, read the caveats at the end before you quote anything here.
For these results I used keybench, a benchmark harness for sorted key value stores that I have been working on inspired by sysbench and HammerDB.
The idea is rather simple. You write the workload in Lua, the harness drives it across one or more storage engines, times every operation, and reports throughput and latency. The same script runs unchanged against every engine, so a comparison measures the engines and not the harness. A few design points matter for reading the rest of this article.
The engine owns concurrency. keybench spawns the worker threads, splits the work across them, and joins them. It never holds a lock around an engine call, so a serialized engine reports as serialized and a parallel one reports as parallel. The Lua script is single threaded and never reasons about locks.
Two rates are reported. wu/s is workload units per second, one unit being one call to your run() function, a whole operation as the script defines it such as a cart checkout. ops/s is primitive operations per second, the raw key touches. When a unit is one primitive op the two are equal and one line is printed. When a unit is several, such as a batch of B keys, ops/s is B times wu/s and both are printed.
Latency is a distribution, per operation kind. Each of put, get, del, range, mget, mput, mdel keeps its own histogram. The report gives p50, p99, p99.9, and the max. I care more about the tail than the median.
The seed is measured, not hidden. Loading the dataset is its own timed phase with its own thread count, and it streams progress, so I can see ingest rate separately from the timed workload.
One more thing that shaped this run. A storage engine under write pressure will eventually push back, RocksDB by blocking the writer during a stall, TidesDB by returning a busy code that asks the caller to retry. keybench now treats both the same way, it waits and retries the busy code so a stall blocks the writer rather than dropping the write. That keeps the comparison honest, an engine cannot look fast by quietly failing writes, and it means long tails you will see below are real stalls that a client would feel.
Environment
Intel Core i7-11700K, 8 cores and 16 threads, at 3.6GHz
46.8 GiB DDR4
Ubuntu 23.03, Linux 6.2.0 x86_64
WD Blue WDS500G2B0A, a consumer SATA SSD, ext4, 159 GiB volume
gcc 12.3.0, linked against jemalloc so the whole malloc family agrees across both engines
TidesDB v9.3.6, RocksDB v11.1.1, keybench 0.1.1
This is a modest consumer box on a SATA SSD, not a server with NVMe.
How the engines were run
Every workload was run against both engines across 1, 8, and 16 threads for 60 seconds per point, single run, with the median reported per point. The reason it is a single run rather than three is stated in the caveats.
The dataset was 500,000 keys with 4 KiB values, the cart workload sized by 90,000 users of line items instead. Each seed loaded that full dataset, half a million keys for mixed, scan, and batch and a comparable count of line items for cart, so the live data is roughly 2 GiB, about 16 times the combined 128 MiB of memtable and block cache each engine was given. The on disk footprint is larger than that and keeps growing through the run, because under seed once all three thread points write against the one store for 60 seconds each, and an uncompressed LSM holds obsolete versions and tombstones until compaction clears them. The point of the sizing was to push the data out of the memtable and into SSTables and compaction rather than let it sit in memory.
Both engines were configured for parity as far as their knobs allow, which is the important hedge.
RocksDB v11.1.1TidesDB v9.3.6compressionoff (kNoCompression)off (none)write buffer64 MiB64 MiBblock cache64 MiB64 MiBbloom filter10 bitsenabled, fpr 0.01compaction workersmax_background_jobs=8num_compaction_threads=4L0 pressuretrigger 4, slowdown 5, stop 10l0_queue_stall_threshold=10, l1_file_count_trigger=4durabilitydefault WAL, no explicit syncsync_mode=none<br>These are matched in spirit, small memtable, no compression, a bloom filter on the read path, and an aggressive L0 setting so compaction has to keep up. They are not matched one to one, because the engines do not share knobs. Read the results as “these two configurations on this box”, not “the best each engine can do”.
The seed used keybench’s seed once mode, the store is seeded a single time per engine and the whole thread sweep runs against that one store rather than reseeding for every point. That models the realistic shape of...