GitHub - RunEdgeAI/turboquant.cpp: Near-optimal online vector quantization in C++23 — 1-4 bits per coordinate, no training, no codebooks · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
RunEdgeAI
turboquant.cpp
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>4 Commits<br>4 Commits
.github
.github
benchmarks
benchmarks
examples
examples
include/turboquant
include/turboquant
src
src
.bazelrc
.bazelrc
.bazelversion
.bazelversion
.gitignore
.gitignore
BUILD.bazel
BUILD.bazel
LICENSE
LICENSE
MODULE.bazel
MODULE.bazel
MODULE.bazel.lock
MODULE.bazel.lock
README.md
README.md
View all files
Repository files navigation
turboquant.cpp
C++ implementation of TurboQuant, a near-optimal online vector quantization algorithm.
Based on TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
What it does
Compresses high-dimensional floating-point vectors to low-bitwidth integers (1-4 bits per coordinate) while preserving inner products and distances. No preprocessing, no training, no codebook learning. Vectors are quantized independently on arrival.
Two quantizers are provided:
QuantizerMSE minimizes reconstruction error (mean-squared error).
QuantizerProd provides unbiased inner product estimates with low variance.
Both achieve distortion within a constant factor (~2.7x) of the information-theoretic lower bound.
Build
Built with Bazel. Depends on Eigen 3.4, resolved automatically from the Bazel Central Registry — no manual install needed.
bazel build //...
Run the example:
bazel run //:turboquant_example
To depend on turboquant from another Bazel project, add it to your MODULE.bazel and use @turboquant as a dependency.
Usage
#include
using namespace turboquant;
std::mt19937 rng(42);<br>int dim = 1536;<br>int bitwidth = 3;
// MSE quantizer<br>QuantizerMSE qmse(dim, bitwidth, rng);<br>auto compressed = qmse.quantize(x);<br>Vec reconstructed = qmse.dequantize(compressed);
// Inner product quantizer (unbiased)<br>QuantizerProd qprod(dim, bitwidth, rng);<br>auto compressed = qprod.quantize(x);<br>float ip = qprod.estimate_inner_product(y, compressed);">#include turboquant/turboquant.h><br>#include random>
using namespace turboquant;
std::mt19937 rng(42);<br>int dim = 1536;<br>int bitwidth = 3;
// MSE quantizer<br>QuantizerMSE qmse(dim, bitwidth, rng);<br>auto compressed = qmse.quantize(x);<br>Vec reconstructed = qmse.dequantize(compressed);
// Inner product quantizer (unbiased)<br>QuantizerProd qprod(dim, bitwidth, rng);<br>auto compressed = qprod.quantize(x);<br>float ip = qprod.estimate_inner_product(y, compressed);
Theoretical guarantees
For unit-norm vectors at bitwidth b:
Metric<br>Upper bound<br>Lower bound
MSE<br>sqrt(3pi)/2 * 4^(-b)<br>4^(-b)
Inner product error<br>sqrt(3pi^2) * norm(y)^2 / (d * 4^b)<br>norm(y)^2 / (d * 4^b)
Benchmarks
Single-threaded, Apple M4 Max, bazel run -c opt //:turboquant_benchmark. Compression ratio is fp32 size vs. the bit-packed wire format (indices + norm metadata).
Quantizer<br>quantize/s<br>dequantize/s<br>inner product/s<br>compression
MSE<br>1536<br>10.0k<br>9.6k<br>31.3x
MSE<br>1536<br>9.8k<br>9.5k<br>15.8x
MSE<br>1536<br>9.7k<br>9.6k<br>8.0x
Prod<br>1536<br>2.9k<br>4.1k<br>4.2k<br>15.7x
Prod<br>1536<br>2.9k<br>4.0k<br>4.1k<br>7.9x
MSE<br>256<br>319k<br>312k<br>15.1x
Prod<br>256<br>108k<br>154k<br>162k<br>14.2x
Throughput is dominated by the dense d x d rotation matvec, so it scales as O(d^2) and is nearly independent of bitwidth (see Limitations). Run bazel run -c opt //:turboquant_benchmark for the full sweep (d = 256/768/1536, b = 1-4).
Limitations
Bitwidths 1-4 only (precomputed codebooks). Extending to higher bitwidths requires solving the Lloyd-Max optimization for the Beta distribution at the desired precision.
Stores full d x d rotation matrix in memory. For large dimensions, a randomized Hadamard transform would reduce this to O(d log d).
No SIMD optimization yet. The quantization loop is straightforward to vectorize with AVX2/AVX-512.
About
Near-optimal online vector quantization in C++23 — 1-4 bits per coordinate, no training, no...