Building a voice dictation pipeline tuned for devs

matteo8p1 pts0 comments

Voice dictation tuned for developers — Freestyle<br>← All posts

Today, we're excited to launch Freestyle Transcribe, the all in one dictation service tailored to improving accuracy for developers.<br>When we first put out Freestyle as an open source project, we wanted to allow developers to configure whatever transcription pipeline they wanted. Choose your own speech model, whether or not you want post-processing. 19 voice models and 8 post models available, 152 configurations to choose from.<br>That customizability is nice, but the vast majority of first time users don't know what models to choose that suits them best. Model performance varies heavily by use case. Building transcription for developers is particularly tricky as there are tons of technical terms and tech vocabulary is always dynamic.<br>That's why we wanted to build Freestyle Transcribe as a dictation service that works out of the box for the developer use case. You can still choose to customize your own dictation experience, but Freestyle Transcribe is there as an option at your disposal.<br>Dictation pipeline for developers<br>ASR models themselves are not well suited for the developer use case. They're great for general dictation, but they don't have the context of the use case. The developer vocabulary like "MySQL" might get transcribed to "My sequel".<br>Here's a short experiment I did with vanilla Qwen3-ASR, no biasing and no post-processing:<br>What I saidWhat Qwen3 produced"Upload a file to S3, which fires EventBridge, which triggers Lambda""Upload a file to S three, which fires an event bridge, which triggers lambda.""We're on Postgres 16 with pgvector 0.7""We're on Postgres sixteen with PG vector zero point seven."<br>Qwen3-ASR's produces results that are comprehensible, but they're not the perfect results we're shooting for.<br>We're shooting for perfection and want to do that through our work in Freestyle Transcribe. With a combination of ASR biasing and post-processing, we can get near perfect results. Here's a simple pipeline that we've built so far that powers Freestyle Transcribe:

Freestyle Transcribe — dictation tuned for the developer use case.You hold the hotkey, speak, and release. The audio is recorded and sent to Freestyle Transcribe as an encoded multipart/form-data data.<br>Freestyle Transcribe routes the audio data to Groq servers for transcription (Whisper Large 3 Turbo, with the language preferences and the biasing that we configured to perform better on dev vocabulary. We receive the initial transcription back.<br>Freestyle takes the initial result and passes it back to Groq for post processing with Qwen3-32B. Post processing does grammar + punctuation clean up, and applies fixes with bias towards developer vocabulary.<br>That final result is sent back to Freestyle, which then pastes at your cursor.<br>Why post-processing<br>Post-processing allows us to make contextual adjustments to the transcription that the ASR model is limited in handling.<br>Contextual corrections:<br>"Migrate over the my sequel database over to amazon RDS" → "Migrate over the MySQL database over to amazon RDS"<br>Grammar punctuation cleanup:<br>"Let's book the meeting for ten PM, actually, eleven PM" → "Let's book the meeting for 11PM"<br>Latency<br>We are targeting sub-second latency. The factors that affect latency include device WiFi speed (Audio uploading time), Cloudflare Workers performance (where we host the service), Groq service performance, and the inference time of the model.<br>From our tests, the average latency time hovers around 600ms from hotkey release to paste. True latency performance will fluctuate based off of the above mentioned factors<br>Privacy and data retention<br>Your audio and transcriptions are never saved anywhere in the dictation pipeline.<br>Freestyle and Freestyle Transcribe has Zero Data Retention (ZDR) by default. This means that we never store any of your transcriptions or audio at any step of the pipeline. We also have Zero Day Retention agreements with Groq, our model provider.<br>If you need absolute local experience, we offer a diverse selection of speech models that run entirely on device.<br>Try it out on Freestyle<br>Freestyle Transcribe is out and available on Freestyle today on Beta. All services are free of use while in Beta, limited to 10,000 transcriptions per week.<br>Check out the downloads page to get started!<br>Matthew Wang<br>Maintainer · Freestyle<br>Maintainer of Freestyle Voice

Found this useful? Pass it on.<br>Keep reading<br>Effective ways to use Claude Code with voice dictation<br>Jun 15, 2026 · 5 minHow we built Wispr Flow's floating pill<br>Jun 8, 2026 · 6 minVoice dictation should be free and open source.<br>May 31, 2026 · 5 min

freestyle dictation transcribe post processing pipeline

Related Articles