Cartesia AI releases SOTA TTS and ASR models

dpstart011 pts0 comments

Cartesia \ Introducing Sonic-3.5 and Ink-2

Contact SalesTry Cartesia<br>CartesiaTry Cartesia

ProductsSonicText to speechInkSpeech to textLineVoice agents

LanguagesCustomersPricingResourcesExplore<br>svg]:size-5">Docssvg]:size-5">Blogsvg]:size-5">Startupssvg]:size-5">Trust Center<br>Company<br>svg]:size-5">Aboutsvg]:size-5">Careerssvg]:size-5">Researchsvg]:size-5">Events

Contact SalesTry Cartesia

3 months on us

#1, then #1 again.#1, then #1 again.#1

Introducing Sonic-3.5 and Ink-2.

Build your entire voice stack with one model provider - the only one<br>ranked #1 on both speech and transcription. Don't compromise on quality or<br>speed.

Contact Sales

Try Cartesia - 3 months on us*

*Terms and conditions here

Hear it for yourself.<br>Ink-2:<br>Ranked #1 on accuracy, with fast<br>turn-taking for natural conversations

Sonic-3.5:<br>Ranked #1 for naturalness, low<br>latency with support for 40+ languages

The full stack for interactive intelligence.

Try Cartesia

Co-designed end to end for voice agents<br>The only STT and TTS optimized across the full real-time pipeline.

One API, no assembly required<br>Ship both models in one integration — less vendor stitching, more building.

The tightest loop in voice<br>Hit sub-90ms TTS and 100ms transcript latency with native turn detection.

Join the teams making the switch to Cartesia

Artificial Analysis<br>Ranked #1

in Speech Arena leaderboard & Speech to Text leaderboard by<br>Artificial Analysis

&ldquo;

Cartesia Sonic-3.5 has become one of the top-performing models for us by<br>combining low latency with natural pacing... helping us deliver strong<br>voice quality across a growing set of languages where other models often<br>fall short.

At Cartesia, we believe the tradeoffs that define today’s voice AI<br>Speed versus Naturalness,<br>Accuracy versus Cost,<br>are largely architectural in origin, not inevitable.<br>We’ve spent years building and scaling State Space Models because we believe the right primitives eliminate constraints rather than work around them.<br>And we built Sonic-3.5 and Ink-2 not by optimizing within accepted limits, but by questioning whether those limits need to exist at all.

Build with the fastest models you can trust.

Try Cartesia

Our models are designed for live, synchronous interactions, built on<br>State Space Models (SSMs).<br>A new primitive for large-scale foundation models, SSMs deliver ultra-low<br>latency, long-context reasoning, and greater efficiency at scale.

Ink. Speech-to-text<br>The fastest, most accurate streaming transcription model.

Sonic. Text-to-speech<br>The fastest, ultra-realistic voice synthesis model.

Trusted by leading enterprises.<br>Speaking from experience.

Discover success stories

Elise AI

We didn't switch to Sonic 3.5 because it was incrementally better, we switched because nothing else came close… we've seen a 2.9% lift in our conversion and a 12.2% increase in customer engagement.

ServiceNow

Cartesia's state-space models bring enterprise-grade speed and quality to our AI Voice Agents… making it possible for businesses to deploy secure, scalable voice agents that can understand, act, and adapt in real time.

Sierra

Cartesia Sonic 3.5 has become one of the top-performing models for us by combining low latency with natural pacing… helping us deliver strong voice quality across a growing set of languages where other models often fall short.

Callers

Sonic 3.5 has been a meaningful upgrade for Callers… latency and naturalness directly impact conversational flow and user success, and the new model noticeably improves both. We've seen more human interactions — especially in high-volume customer conversations where every millisecond and every turn matters.

Take2 AI

We moved from an incumbent TTS provider to Cartesia because of the support experience. After repeated roadblocks with our previous provider, the difference with Cartesia has been transformative — responsive, technical, and genuinely invested in our success.

Elise AI

We didn't switch to Sonic 3.5 because it was incrementally better, we switched because nothing else came close… we've seen a 2.9% lift in our conversion and a 12.2% increase in customer engagement.

ServiceNow

Cartesia's state-space models bring enterprise-grade speed and quality to our AI Voice Agents… making it possible for businesses to deploy secure, scalable voice agents that can understand, act, and adapt in real time.

Sierra

Cartesia Sonic 3.5 has become one of the top-performing models for us by combining low latency with natural pacing… helping us deliver strong voice quality across a growing set of languages where other models often fall short.

Callers

Sonic 3.5 has been a meaningful upgrade for Callers… latency and naturalness directly impact conversational flow and user success, and the new model noticeably improves both. We've seen more human interactions — especially in high-volume customer conversations where every millisecond and every turn matters.

Take2 AI

We moved from an incumbent TTS provider to Cartesia because of the...

cartesia models sonic voice size latency

Related Articles