Project - transcribe.cpptranscribe.cpp<br>Apr 2026 - now<br>I'm super excited to share transcribe.cpp today.
transcribe.cpp is a ggml based transcription library which supports all the latest transcription models.<br>Every model published under the handy-computer HF org<br>has been numerically validated and WER tested to match the reference implementation. It's accelerated everywhere.
I'm the author and maintainer of Handy. This library grew<br>from the pains of distributing a cross-platform speech-to-text application to many people.
This is a v0.1.0 library which means that there are some rough edges which I<br>cannot discover alone! Please report them, and let's fix them together!
Motivation
Let me say this. I think distributing a cross-platform application with the current<br>ASR inference stack is terrible.
You've basically got whisper.cpp and ONNX. That's it. You could roll MLX<br>in for Apple devices, but now you've to support two different engines and<br>port models to each. I've been a fan of ONNX for getting model support into<br>Handy quickly, but so much performance is left on the table with CPU only.
There are a few random libraries out there which claim to support a lot of models,<br>but they have unknown authors, and unknown testing, as far as I've seen. They<br>leave me with more questions than answers.
When will they stop maintaining this library? Has the creator thought<br>about bindings so you can actually use it in a real desktop or mobile app?<br>Is this effectively demo code? Have they benchmarked it? Is it faster<br>than ONNX?
And this is what led to transcribe.cpp. As Handy's maintainer I needed<br>a library I could trust. Where I could download a file and run inference on it. Where<br>I can know that the inference coming from the model in the engine is as good as the<br>reference implementation. The inference should run on the GPU for the best performance.<br>It should be trivially embeddable in Handy, it cannot be a huge pytorch lib.<br>It must be something that works on Mac, Windows, and Linux. And ggml<br>seemed like by far the best way forward. It has a strong community, and<br>a great distribution story.
So what do you get?
You get a fast and accurate inference engine with wide ranging model support.
Support for 16 ASR Families (60+ models) with more coming
Acceleration via Vulkan, Metal, CUDA, and TinyBLAS
Every model has been numerically verified and WER tested
Support for Streaming Transcription
Support for Batch Transcription
More or less drop in whisper.cpp replacement
Maintainer supported bindings in 4 Languages
Python
Javascript/Typescript
Rust
ObjC/Swift
Wide Model Support
We intend to support as many state-of-the-art transcription models as possible.<br>As of today, we support most of the modern transcription models that are publicly available.<br>There are a few missing still, but they will be added soon.
Acceleration Support
One of my top goals was to run any ASR model I wanted on Vulkan. In my opinion<br>this is the floor for any application shipping local inference. For every model we support, there is<br>a corresponding benchmark run from a Ryzen 4750U (CPU + Vulkan) on Fedora as well as<br>on my M4 Max.
Numerically Verified
I also wanted to make sure that inference in transcribe.cpp<br>is accurate and as close to the reference implementation as possible.<br>This largely came from a huge degree of uncertainty of inference accuracy<br>when using .onnx models I found on Hugging Face. In order to ensure the inference<br>we do is correct we numerically validate every model versus the reference.<br>On top of numerical validation, we run full WER sweeps to make sure that whatever<br>the reference is outputting, we output the same thing. That means every model has run<br>through thousands of utterances and is very close or same as the reference. And<br>the results of this data are published in the transcribe.cpp repo as well as with<br>each model on Hugging Face.
Drop In whisper.cpp replacement
transcribe.cpp is more or less a drop in support for whisper.cpp. The main reason<br>for this is: Handy used whisper.cpp and I needed to ship an update with<br>transcribe.cpp which would replace it. I needed to keep some compatibility<br>with the very popular .bin files which run in whisper.cpp and shipped with Handy.<br>transcribe.cpp can run them. There are some flags and features in whisper.cpp<br>which we do not support yet. But I think for the vast majority of use cases<br>our whisper implementation is solid and can replace whisper.cpp while having<br>about equal performance.
Real Distribution
Language bindings were on my mind to begin with. While this library is written<br>in C/C++, I needed bindings in Rust. And I also knew that in order for us<br>to distribute local transcription as widely as possible, it requires at minimum<br>decent first-party support of bindings. I've chosen 4 languages that I<br>think are fairly representative of where people will use the library.<br>I welcome others to contribute bindings directly to the project as well, assuming<br>that they are willing to take...