Show HN: Turboquant.cpp – Quantize embeddings to 1-4 bits, no training (400 LoC)

andrewmikhail1 pts0 comments

GitHub - RunEdgeAI/turboquant.cpp: Near-optimal online vector quantization in C++23 — 1-4 bits per coordinate, no training, no codebooks · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

RunEdgeAI

turboquant.cpp

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>4 Commits<br>4 Commits

.github

.github

benchmarks

benchmarks

examples

examples

include/turboquant

include/turboquant

src

src

.bazelrc

.bazelrc

.bazelversion

.bazelversion

.gitignore

.gitignore

BUILD.bazel

BUILD.bazel

LICENSE

LICENSE

MODULE.bazel

MODULE.bazel

MODULE.bazel.lock

MODULE.bazel.lock

README.md

README.md

View all files

Repository files navigation

turboquant.cpp

C++ implementation of TurboQuant, a near-optimal online vector quantization algorithm.

Based on TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

What it does

Compresses high-dimensional floating-point vectors to low-bitwidth integers (1-4 bits per coordinate) while preserving inner products and distances. No preprocessing, no training, no codebook learning. Vectors are quantized independently on arrival.

Two quantizers are provided:

QuantizerMSE minimizes reconstruction error (mean-squared error).

QuantizerProd provides unbiased inner product estimates with low variance.

Both achieve distortion within a constant factor (~2.7x) of the information-theoretic lower bound.

Build

Built with Bazel. Depends on Eigen 3.4, resolved automatically from the Bazel Central Registry — no manual install needed.

bazel build //...

Run the example:

bazel run //:turboquant_example

To depend on turboquant from another Bazel project, add it to your MODULE.bazel and use @turboquant as a dependency.

Usage

#include

using namespace turboquant;

std::mt19937 rng(42);<br>int dim = 1536;<br>int bitwidth = 3;

// MSE quantizer<br>QuantizerMSE qmse(dim, bitwidth, rng);<br>auto compressed = qmse.quantize(x);<br>Vec reconstructed = qmse.dequantize(compressed);

// Inner product quantizer (unbiased)<br>QuantizerProd qprod(dim, bitwidth, rng);<br>auto compressed = qprod.quantize(x);<br>float ip = qprod.estimate_inner_product(y, compressed);">#include turboquant/turboquant.h><br>#include random>

using namespace turboquant;

std::mt19937 rng(42);<br>int dim = 1536;<br>int bitwidth = 3;

// MSE quantizer<br>QuantizerMSE qmse(dim, bitwidth, rng);<br>auto compressed = qmse.quantize(x);<br>Vec reconstructed = qmse.dequantize(compressed);

// Inner product quantizer (unbiased)<br>QuantizerProd qprod(dim, bitwidth, rng);<br>auto compressed = qprod.quantize(x);<br>float ip = qprod.estimate_inner_product(y, compressed);

Theoretical guarantees

For unit-norm vectors at bitwidth b:

Metric<br>Upper bound<br>Lower bound

MSE<br>sqrt(3pi)/2 * 4^(-b)<br>4^(-b)

Inner product error<br>sqrt(3pi^2) * norm(y)^2 / (d * 4^b)<br>norm(y)^2 / (d * 4^b)

Benchmarks

Single-threaded, Apple M4 Max, bazel run -c opt //:turboquant_benchmark. Compression ratio is fp32 size vs. the bit-packed wire format (indices + norm metadata).

Quantizer<br>quantize/s<br>dequantize/s<br>inner product/s<br>compression

MSE<br>1536<br>10.0k<br>9.6k<br>31.3x

MSE<br>1536<br>9.8k<br>9.5k<br>15.8x

MSE<br>1536<br>9.7k<br>9.6k<br>8.0x

Prod<br>1536<br>2.9k<br>4.1k<br>4.2k<br>15.7x

Prod<br>1536<br>2.9k<br>4.0k<br>4.1k<br>7.9x

MSE<br>256<br>319k<br>312k<br>15.1x

Prod<br>256<br>108k<br>154k<br>162k<br>14.2x

Throughput is dominated by the dense d x d rotation matvec, so it scales as O(d^2) and is nearly independent of bitwidth (see Limitations). Run bazel run -c opt //:turboquant_benchmark for the full sweep (d = 256/768/1536, b = 1-4).

Limitations

Bitwidths 1-4 only (precomputed codebooks). Extending to higher bitwidths requires solving the Lloyd-Max optimization for the Beta distribution at the desired precision.

Stores full d x d rotation matrix in memory. For large dimensions, a randomized Hadamard transform would reduce this to O(d log d).

No SIMD optimization yet. The quantization loop is straightforward to vectorize with AVX2/AVX-512.

About

Near-optimal online vector quantization in C++23 — 1-4 bits per coordinate, no training, no...

turboquant bazel bitwidth compressed quantize include

Related Articles