Transcribe.cpp

Announcing transcribe.cpp

We are excited to announce that CJ Pais' transcribe.cpp has been officially released! What is transcribe.cpp? transcribe.cpp is a C/C++ speech-to-text (STT) inference library. Think about it as "llama.cpp for STT models": it relies on the ggml runtime to support a variety of STT model families via GGUF, models which you can run with Metal, Vulkan, and CUDA backends for fast GPU inference. CJ has previously collaborated with our Mozilla siblings through their Builders program, contributing to the llamafile project in different ways. He created the LocalScore benchmarking tool, added support for new models, and integrated whisper.cpp functionalities in the form of whisperfile. His work on STT has grown into his own desktop application, Handy, which was featured on WIRED at the beginning of this year. transcribe.cpp development started from this observation: many very good STT models could have been included in Handy, but they are often developed in isolation. This leaves them with two recurring weaknesses: poor portability (e.g. MLX models only run on Macs) and sub-optimal performance (as acceleration rarely works everywhere out-of-the-box). transcribe.cpp provides a uniform interface that easily brings GPU acceleration to all these models. The final result is an open source library available not just to Handy, but to everyone wishing to include STT functionalities in their applications. But that's not the only reason to celebrate this release: transcribe.cpp is also the first independent open-source project developed with the support of Mozilla.ai's Builders in Residence (BiR) program! Our goal with BiR is to advance applied, cutting-edge research in the open while connecting it with our own roadmap. In the case of transcribe.cpp, this translates into using the library to build transcribefiles: portable, multi-platform, self-contained executables that you can run (almost) anywhere to perform audio transcription. What does this mean for you? If you are a builder willing to add STT functionalities to your application, then the library’s GitHub repo is your next stop. But you can also play with transcription without the need to write a line of code using Handy, or use our llamafile to bundle your favorite model and configuration into a self-contained executable for an ad-hoc transcription task. And this is just the start: we look forward to seeing people create new tools out of transcribe.cpp!

Using Octonous as a Product Manager

A look at how we use Octonous inside mozilla.ai to reduce the everyday overhead of product work, from turning Slack feedback into GitHub issues to staying on top of product changes and finding context across the tools where work already happens.

Image Classification Comes to encoderfile

Encoderfile now handles images. Starting with image classification, you can run vision models as a single executable — no Python runtime, no serving infrastructure, just a file path in and a label out.

What is an LLM control plane?

Runaway agents? Provider outages? Discover why your AI stack needs an LLM control plane, not just a gateway, to handle production routing, budgets, and privacy.

Use the Otari Gateway with OpenCode

AI coding sessions can feel like a black box. Route OpenCode through the Otari Gateway to track costs, token usage, and model activity in real time. Get budget controls and visibility across every session without changing a single line of application code.

Mozilla.ai's Blog

Subscribe to get the latest news and ideas from our team

Transcribe.cpp

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI