Universal-3.5 Pro Realtime

Universal-3.5 Pro Realtime: every turn in context | AssemblyAI

New

Universal-3.5 Pro Realtime is here.

Learn more

Close menu Get started Log in

platform Platform overview

Models, APIs, and infrastructure in one place. Build voice into any product, on any stack.

products Pre-recorded Speech-to-Text API Realtime Speech-to-Text API Speech Understanding API Voice Agent API Guardrails LLM Gateway

deployments Self-Hosted Voice AI Cloud

use cases AI Scribes AI Notetakers Agent Assist Call Analytics Conversation Intelligence Medical Transcription Voice Agents

customers All customer stories Top Voice AI companies are building with Assembly.

See all stories

Zoom leverages AssemblyAI to help advance its AI research and development. Read the story

Siro achieves a 90% reduction in customer complaints and support tickets. Watch the video

resources Documentation API Reference Cookbooks Support Changelog Status

Latest Release Universal 3.5 Pro Realtime

The first streaming speech-to-text model that takes the agent's question as input.

Learn more

resources Blog Partners Research Benchmarks Security

Close menu Get started Log in

platform Platform overview

Models, APIs, and infrastructure in one place. Build voice into any product, on any stack.

products Pre-recorded Speech-to-Text API Realtime Speech-to-Text API Speech Understanding API Voice Agent API Guardrails LLM Gateway

deployments Self-Hosted Voice AI Cloud

use cases AI Scribes AI Notetakers Agent Assist Call Analytics Conversation Intelligence Medical Transcription Voice Agents

customers All customer stories Top Voice AI companies are building with Assembly.

See all stories

Zoom leverages AssemblyAI to help advance its AI research and development. Read the story

Siro achieves a 90% reduction in customer complaints and support tickets. Watch the video

resources Documentation API Reference Cookbooks Support Changelog Status

Latest Release Universal 3.5 Pro Realtime

The first streaming speech-to-text model that takes the agent's question as input.

Learn more

resources Blog Partners Research Benchmarks Security

Written by AssemblyAI Team

Published on 23 June 2026

Twitter Linkedin

A customer spells out an email address and the agent writes "user at gmail dot com" as a sentence. A caller slides from Hindi to English mid-sentence and the transcript loses the thread. Neither is an edge case. Both happen because the model hears each moment on its own, with no sense of what came before it, and no conversation works that way.

Today we're releasing Universal-3.5 Pro Realtime, our new flagship realtime model. Two things define this release. The first is context: the model takes direction from your agent, remembers the conversation on its own, and hears the speaker instead of the room. The second is languages: 18 of them at full accuracy, with mid-sentence code-switching, plus steering to commit the model to just one when you already know it. Same $0.45/hr. Same pipeline. One line to upgrade.

Context: your agent knows the question. Now the model does too.

A voice agent has something no transcription model has ever had access to: it knows what it just asked. Universal-3.5 Pro Realtime closes that gap. Pass the question in with agent_context and the model hears the reply through the lens of the question. Prime it with "What's your email address?" and a mumbled answer resolves to user@assemblyai.com instead of "user at assembly a i dot com." Spelled-out account IDs, street addresses, one-word confirmations: the short utterances that wreck most realtime models finally have the context to come out right.

And it's measurable. Across a benchmark of 20,000 voice agent audio files, passing agent context cut word error rate by 10.2%, with the gains concentrated exactly where agents hurt most.

Word error rate reduction

Fabrications -18.3%

Hallucinations -17.2%

Place-name entities -15.5%

Short-utterance errors -13.7%

Name entities -9.4%

Medical entities -9.4%

Technical entities -7.0%

Entity errors (overall) -5.1%

Even when you pass nothing, the model no longer starts each turn cold. It keeps a short, rolling memory of the conversation and uses it as context for whatever comes next. On by default. Nothing to configure.

This is where accuracy compounds. One voice agent team paired agent context with prompting and cut their utterance error rate from 26% to 9% on their own production audio.

Low-latency STT with access to more context is exactly what I've wanted to see from next-generation models. The Context Carryover feature of Universal 3.5 Pro delivers on that.

Sharpen it further with prompting

Universal-3.5 Pro is highly accurate out of the box, but for challenging audio like short clips with limited context, noisy environments, or audio with very niche references, providing a brief description in the prompt parameter can meaningfully improve accuracy. For example, here's a 2-second clip from a League of Legends pro interview:

League of...

Universal-3.5 Pro Realtime

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI