Universal-3.5 Pro Realtime: every turn in context | AssemblyAI
New
Universal-3.5 Pro Realtime is here.
Learn more
Close menu<br>Get started Log in
platform<br>Platform overview
Models, APIs, and infrastructure in one place. Build voice into any product, on any stack.
products<br>Pre-recorded Speech-to-Text API Realtime Speech-to-Text API Speech Understanding API Voice Agent API Guardrails LLM Gateway
deployments<br>Self-Hosted Voice AI Cloud
use cases<br>AI Scribes AI Notetakers Agent Assist Call Analytics Conversation Intelligence Medical Transcription Voice Agents
customers<br>All customer stories<br>Top Voice AI companies are building with Assembly.
See all stories
Zoom leverages AssemblyAI to help advance its AI research and development.<br>Read the story
Siro achieves a 90% reduction in customer complaints and support tickets.<br>Watch the video
resources<br>Documentation API Reference Cookbooks Support Changelog Status
Latest Release<br>Universal 3.5 Pro Realtime
The first streaming speech-to-text model that takes the agent's question as input.
Learn more
resources<br>Blog Partners Research Benchmarks Security
Close menu<br>Get started Log in
platform<br>Platform overview
Models, APIs, and infrastructure in one place. Build voice into any product, on any stack.
products<br>Pre-recorded Speech-to-Text API Realtime Speech-to-Text API Speech Understanding API Voice Agent API Guardrails LLM Gateway
deployments<br>Self-Hosted Voice AI Cloud
use cases<br>AI Scribes AI Notetakers Agent Assist Call Analytics Conversation Intelligence Medical Transcription Voice Agents
customers<br>All customer stories<br>Top Voice AI companies are building with Assembly.
See all stories
Zoom leverages AssemblyAI to help advance its AI research and development.<br>Read the story
Siro achieves a 90% reduction in customer complaints and support tickets.<br>Watch the video
resources<br>Documentation API Reference Cookbooks Support Changelog Status
Latest Release<br>Universal 3.5 Pro Realtime
The first streaming speech-to-text model that takes the agent's question as input.
Learn more
resources<br>Blog Partners Research Benchmarks Security
Written by<br>AssemblyAI Team
Published on<br>23 June 2026
Twitter Linkedin
A customer spells out an email address and the agent writes "user at gmail dot com" as a sentence. A caller<br>slides from Hindi to English mid-sentence and the transcript loses the thread. Neither is an edge case. Both<br>happen because the model hears each moment on its own, with no sense of what came before it, and no<br>conversation works that way.
Today we're releasing Universal-3.5 Pro Realtime, our new flagship realtime model. Two things define this<br>release. The first is context: the model takes direction from your agent, remembers the conversation on its<br>own, and hears the speaker instead of the room. The second is languages: 18 of them at full accuracy, with<br>mid-sentence code-switching, plus steering to commit the model to just one when you already know it. Same<br>$0.45/hr. Same pipeline. One line to upgrade.
Context: your agent knows the question. Now the model does too.
A voice agent has something no transcription model has ever had access to: it knows what it just asked.<br>Universal-3.5 Pro Realtime closes that gap. Pass the question in with agent_context and the<br>model hears the reply through the lens of the question. Prime it with "What's your email address?" and a<br>mumbled answer resolves to user@assemblyai.com instead of "user at assembly a i dot com."<br>Spelled-out account IDs, street addresses, one-word confirmations: the short utterances that wreck most<br>realtime models finally have the context to come out right.
And it's measurable. Across a benchmark of 20,000 voice agent audio files, passing agent context cut word<br>error rate by 10.2%, with the gains concentrated exactly where agents hurt most.
Word error rate reduction
Fabrications -18.3%
Hallucinations -17.2%
Place-name entities -15.5%
Short-utterance errors -13.7%
Name entities -9.4%
Medical entities -9.4%
Technical entities -7.0%
Entity errors (overall) -5.1%
Even when you pass nothing, the model no longer starts each turn cold. It keeps a short, rolling memory of<br>the conversation and uses it as context for whatever comes next. On by default. Nothing to configure.
This is where accuracy compounds. One voice agent team paired agent context with prompting and cut their<br>utterance error rate from 26% to 9% on their own production audio.
Low-latency STT with access to more context is exactly what I've wanted to see from next-generation<br>models. The Context Carryover feature of Universal 3.5 Pro delivers on that.
Sharpen it further with prompting
Universal-3.5 Pro is highly accurate out of the box, but for challenging audio like short clips with<br>limited context, noisy environments, or audio with very niche references, providing a brief description in<br>the prompt parameter can meaningfully improve accuracy. For example, here's a 2-second clip<br>from a League of Legends pro interview:
League of...