Cartesia \ Introducing Sonic-3.5 and Ink-2
Contact SalesTry Cartesia<br>CartesiaTry Cartesia
ProductsSonicText to speechInkSpeech to textLineVoice agents
LanguagesCustomersPricingResourcesExplore<br>svg]:size-5">Docssvg]:size-5">Blogsvg]:size-5">Startupssvg]:size-5">Trust Center<br>Company<br>svg]:size-5">Aboutsvg]:size-5">Careerssvg]:size-5">Researchsvg]:size-5">Events
Contact SalesTry Cartesia
3 months on us
#1, then #1 again.#1, then #1 again.#1
Introducing Sonic-3.5 and Ink-2.
Build your entire voice stack with one model provider - the only one<br>ranked #1 on both speech and transcription. Don't compromise on quality or<br>speed.
Contact Sales
Try Cartesia - 3 months on us*
*Terms and conditions here
Hear it for yourself.<br>Ink-2:<br>Ranked #1 on accuracy, with fast<br>turn-taking for natural conversations
Sonic-3.5:<br>Ranked #1 for naturalness, low<br>latency with support for 40+ languages
The full stack for interactive intelligence.
Try Cartesia
Co-designed end to end for voice agents<br>The only STT and TTS optimized across the full real-time pipeline.
One API, no assembly required<br>Ship both models in one integration — less vendor stitching, more building.
The tightest loop in voice<br>Hit sub-90ms TTS and 100ms transcript latency with native turn detection.
Join the teams making the switch to Cartesia
Artificial Analysis<br>Ranked #1
in Speech Arena leaderboard & Speech to Text leaderboard by<br>Artificial Analysis
“
Cartesia Sonic-3.5 has become one of the top-performing models for us by<br>combining low latency with natural pacing... helping us deliver strong<br>voice quality across a growing set of languages where other models often<br>fall short.
At Cartesia, we believe the tradeoffs that define today’s voice AI<br>Speed versus Naturalness,<br>Accuracy versus Cost,<br>are largely architectural in origin, not inevitable.<br>We’ve spent years building and scaling State Space Models because we believe the right primitives eliminate constraints rather than work around them.<br>And we built Sonic-3.5 and Ink-2 not by optimizing within accepted limits, but by questioning whether those limits need to exist at all.
Build with the fastest models you can trust.
Try Cartesia
Our models are designed for live, synchronous interactions, built on<br>State Space Models (SSMs).<br>A new primitive for large-scale foundation models, SSMs deliver ultra-low<br>latency, long-context reasoning, and greater efficiency at scale.
Ink. Speech-to-text<br>The fastest, most accurate streaming transcription model.
Sonic. Text-to-speech<br>The fastest, ultra-realistic voice synthesis model.
Trusted by leading enterprises.<br>Speaking from experience.
Discover success stories
Elise AI
We didn't switch to Sonic 3.5 because it was incrementally better, we switched because nothing else came close… we've seen a 2.9% lift in our conversion and a 12.2% increase in customer engagement.
ServiceNow
Cartesia's state-space models bring enterprise-grade speed and quality to our AI Voice Agents… making it possible for businesses to deploy secure, scalable voice agents that can understand, act, and adapt in real time.
Sierra
Cartesia Sonic 3.5 has become one of the top-performing models for us by combining low latency with natural pacing… helping us deliver strong voice quality across a growing set of languages where other models often fall short.
Callers
Sonic 3.5 has been a meaningful upgrade for Callers… latency and naturalness directly impact conversational flow and user success, and the new model noticeably improves both. We've seen more human interactions — especially in high-volume customer conversations where every millisecond and every turn matters.
Take2 AI
We moved from an incumbent TTS provider to Cartesia because of the support experience. After repeated roadblocks with our previous provider, the difference with Cartesia has been transformative — responsive, technical, and genuinely invested in our success.
Elise AI
We didn't switch to Sonic 3.5 because it was incrementally better, we switched because nothing else came close… we've seen a 2.9% lift in our conversion and a 12.2% increase in customer engagement.
ServiceNow
Cartesia's state-space models bring enterprise-grade speed and quality to our AI Voice Agents… making it possible for businesses to deploy secure, scalable voice agents that can understand, act, and adapt in real time.
Sierra
Cartesia Sonic 3.5 has become one of the top-performing models for us by combining low latency with natural pacing… helping us deliver strong voice quality across a growing set of languages where other models often fall short.
Callers
Sonic 3.5 has been a meaningful upgrade for Callers… latency and naturalness directly impact conversational flow and user success, and the new model noticeably improves both. We've seen more human interactions — especially in high-volume customer conversations where every millisecond and every turn matters.
Take2 AI
We moved from an incumbent TTS provider to Cartesia because of the...