Show HN: We dropped Go for Rust in our real-time telephony AI media plane

In building Vivik, an execution-grade telephony AI engine, we faced a brutal constraint: the human conversational loop.In psychoacoustics, a delay under 250 ms feels instantaneous. At 500 ms, users notice lag. Beyond 800 ms, conversations start feeling strained, and by 1.5 seconds, the illusion of real-time interaction collapses.That creates an extremely tight latency budget for voice AI:• Network RTT: 50–200 ms • LLM inference: 200–800 ms • TTS synthesis: 100–400 ms • ASR processing: 100–300 msTo consistently stay under a sub-500 ms SLA, the orchestration and media layers themselves must add almost no overhead.We initially built the entire system in Go. It worked well for concurrency and distributed orchestration, but under production-scale load, we hit an architectural wall: non-deterministic GC tail latency.The Media Plane processes raw PCM audio in strict 20 ms frames. Even tiny scheduling delays create audible jitter, packet drift, and conversational instability.Under a 25,000 RPS stress test:• Go implementation → P99 latency: 1,550 ms • Rust (Tokio) implementation → P99 latency: 310 msThe issue wasn’t average latency. It was the tail.Even highly optimized GC pauses become catastrophic in real-time telephony. A tiny scheduler interruption under heavy throughput creates queue backpressure that cascades across live audio streams. In practice, a 1.5-second spike means the system goes silent mid-sentence.We solved this by separating the architecture into two isolated worlds:1. Control Plane (Go + NATS) Handles orchestration, routing, distributed state, and API coordination. Managed GC is acceptable here because it never touches live media streams.2. Media Plane (Rust) Handles resampling, low-pass filtering, VAD, and packet-level audio processing with deterministic memory behavior.Rust’s ownership model eliminates the need for a background garbage collector entirely. Allocation and deallocation are resolved at compile time, allowing the Media Plane to maintain a flat latency profile even under sustained throughput.We also eliminated traditional synchronization primitives.Mutexes in real-time audio systems introduce priority inversion risks that immediately surface as glitches or packet jitter. Instead, the engine relies on fully lock-free communication patterns:• SPSC ring buffers for PCM transfer between socket and DSP threads • Michael-Scott queues using atomic CAS operations for multi-producer coordinationRust’s SIMD support additionally allowed us to leverage AVX-512 and ARM NEON instructions to process multiple audio samples per instruction cycle, significantly increasing call density per CPU core.The takeaway: managed runtimes are exceptional for distributed systems and asynchronous I/O. But once your workload crosses into hard real-time media constraints and human perceptual boundaries, averages stop mattering. Tail latency becomes the entire system.By separating orchestration from deterministic signal processing, we reduced P99 latency from 1,550 ms to a stable 310 ms under load.Our full engineering breakdowns, including the mathematical foundations behind our O(n) dual-gate VAD signal logic, are detailed in the Vivik whitepaper:https://vivik.bajpailabs.com/whitepaperWould love to hear how others are approaching real-time media constraints alongside LLM execution boundaries.

Show HN: We dropped Go for Rust in our real-time telephony AI media plane

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast