Vox — Dictation that never leaves your device<br>Press, speak, paste.<br>Dictation that never leaves your device.<br>On-device AI dictation for Mac and Windows. No cloud round-trip, no app account, no waiting. Hold a key, speak, and the cleaned-up text lands on your clipboard ready to paste.<br>Download for MacDownload for Windows<br>Apple Silicon (M1 or newer) · macOS 14+Windows 10/11 (x64)Free for personal useNo account. No cloud. No tracking.
Hold ⌘⌥. to startON-DEVICE<br>Hotkey ⌘⌥. on Mac · Ctrl+Alt+. on Windows
On-device.<br>Whisper or Parakeet for transcription, Apple Intelligence or Gemma 4 for cleanup. Everything runs on your device.
No account needed.<br>Nothing to sign up for to use the app. Commercial licenses do require a billing email at Stripe.
No dictation telemetry.<br>Audio, transcripts and crash reports never leave your machine. See our Privacy page for the full network picture.
Works on a plane.<br>After the one-time model download on first run, Vox runs without internet. Verifiable with any network monitor.
Estimated time saved<br>Speaking is roughly 3× faster than typing.<br>Stanford measured 153 WPM speaking vs 52 WPM typing. For one person typing ~3,000 words a day at work, the gap is about 40 minutes a day. Slide your hourly rate to see what those minutes could be worth. Figures are illustrative; your savings will vary.
What your hour is worth$75/hr<br>$15$75$150$200<br>Default $75 — close to the BLS average for US software developers. Move it to whatever your time is actually worth.
Estimated savings · per year<br>147 hr<br>·$11,000
≈ 3.7 full work-weeks of typing you don't have to do — about $917/mo at $75/hr.
Math: 3,000 words ÷ 50 WPM = 60 min typing. ÷ 150 WPM = 20 min speaking. → 40 min back per day · 220 workdays/year · $75/hr.
Where these numbers come from →Speaking ≈ 3× faster than typing — Ruan et al., Stanford HCI (2016): 153 WPM speech vs 52 WPM typing in English; 123 vs 43 WPM in Mandarin.<br>Conversational speaking sits at ~150 WPM — National Center for Voice and Speech via VirtualSpeech.<br>50 WPM typing — adult average is 40 WPM; office workers typically target 60 WPM. 50 is a defensible midpoint (Wonderlic; TypingSpeedHub 2025).<br>3,000 words/day — knowledge workers send ~40 emails/day at ~75 words each, plus Slack and AI chats; ~3,000 words is a defensible midpoint (cloudHQ; Boomerang via EmailAnalytics).<br>$75/hr default — the BLS mean wage for US software developers (May 2024) is $66.78/hr ($138,890/yr). $75 nudges that up slightly to reflect the startup premium most readers will recognize.<br>We assume 220 workdays per year (US standard, excluding weekends and ~10 holidays). The math doesn't count time spent reading or thinking — only the keystroke-vs-utterance gap on text you compose.
Free for personal use<br>Free for you. Paid for your company.<br>Vox is fair-source: free for personal use — your own writing, side projects, hobby work — under the perpetual Personal Use license in the EULA. If you (or anyone on your team) use Vox as part of your job at a company with more than one person, each user needs a commercial license. Pricing starts at $12 USD/seat/mo. See plans & pricing →
How it works<br>Three keys. No setup.<br>No account, no model selection, no permissions tour. Vox ships with sensible defaults so the first dictation works.
01⌘⌥.<br>Hold the hotkey<br>Default ⌘⌥. on Mac (Ctrl+Alt+. on Windows). Vox shows a small listening pill near your cursor.
02<br>Speak normally<br>Filler words and self-corrections are fine — Vox cleans them up. Don't worry about punctuation.
03⌘V<br>Release. Paste.<br>Cleaned-up text lands on your clipboard. Press ⌘V on Mac or Ctrl+V on Windows where you want it. No silent keystroke synthesis.
Voice modes<br>One hotkey, the right voice.<br>Vox picks a mode based on the app you're typing into. Each mode is a tuned cleanup engine — same dictation, different output style.
💬General<br>Anywhere
Balanced cleanup. Drops fillers, fixes self-corrections, enumerates lists.<br>You say<br>“uh so like the meeting is um at three pm tomorrow”<br>Vox writes<br>The meeting is at 3 PM tomorrow.
📧Email<br>Mail · Gmail · Outlook
Formal, fully punctuated email body. Never invents a salutation or sign-off.<br>You say<br>“hey just wanted to follow up on the proposal um can we sync next week”<br>Vox writes<br>Just wanted to follow up on the proposal. Can we sync next week?
💭Chat<br>Slack · Discord · iMessage
Casual and short. Fragments OK, contractions preserved, ruthless trim.<br>You say<br>“yeah I think that works for me um lets just do it on tuesday then”<br>Vox writes<br>yeah works for me — let's do tuesday
⌥Code Comment<br>Xcode · GitHub · VS Code
Present-tense third-person. Preserves identifiers verbatim. No markdown synthesis.<br>You say<br>“so this method invalidates the cache when the user updates their profile”<br>Vox writes<br>Invalidates the cache when the user updates their profile.
✎Notes<br>Apple Notes · Notion · Obsidian
Full sentences. Bullets on enumeration, paragraph breaks at topic shifts.<br>You say<br>“things to do today buy groceries pick up dry cleaning email Sara...