Server-Side WebRTC Noise Reduction with Pion, FFmpeg, and RNN Models |<br>SnowlygI'm Snowlyg<br>Engineering notes on Go backend, WebRTC, reliability, and production debugging.
Server-Side WebRTC Noise Reduction With Pion, FFmpeg, and RNN Models<br>Server-side audio noise reduction for WebRTC calls should start with a narrow validation target. The goal is not to claim that server-side filtering should replace WebRTC’s built-in audio processing. The goal is to test whether a Go media service can receive an Opus track with Pion, decode it to PCM, run FFmpeg’s RNN noise reduction filter, and produce an output that is worth evaluating.<br>The source experiment is based on a public sample project: snowlyg/webrtc_denoise_use_ffmpge. It is a prototype, not production-ready RTC infrastructure.<br>Background<br>WebRTC already includes audio processing blocks such as echo cancellation, noise suppression, and automatic gain control. In a controlled device environment, using the client-side WebRTC audio stack is usually the first and best option.<br>In field deployments, the device side is not always controlled:<br>microphones and speakers may vary across batches.<br>hardware acoustic design may change faster than the software release cycle.<br>vendor firmware may expose inconsistent audio behavior.<br>tuning each device family deeply can create a high maintenance cost.<br>That makes a server-side experiment useful. If a service can receive audio, apply a known filter, and compare the result offline, it becomes easier to decide whether server-side processing is a viable supplement for specific environments.<br>The first safe milestone is not real-time forwarding. It is file-based validation.<br>Processing Boundary<br>The server-side path is:
Browser or client sends a WebRTC audio track.<br>Pion receives the remote track in OnTrack.<br>The service reads RTP packets with track.ReadRTP().<br>Opus payload is decoded to PCM.<br>PCM is written to an FFmpeg process through stdin.<br>FFmpeg applies arnndn with an RNN model.<br>The filtered output is written to a file for comparison.<br>This boundary matters because RTP, Opus, PCM, and FFmpeg raw audio input are different formats. Mixing them up can produce a file, but not necessarily valid audio.<br>Minimal Experiment<br>The prototype starts from Pion’s save-to-disk idea: connect a browser page to a Go process, receive audio/video, and write media out for inspection.<br>The public dependencies are:<br>Pion WebRTC<br>FFmpeg<br>hraban/opus<br>richardpl/arnndn-models<br>The prototype uses Pion v4 and an Opus decoder from gopkg.in/hraban/opus.v2.<br>Decoding Opus to PCM<br>The audio track is usually Opus at 48 kHz. The example code keeps the sample rate explicit and decodes the RTP payload into an int16 PCM buffer.<br>var sampleRate = 48000<br>var channels = 2<br>var frameSizeMs = 60<br>frameSize := channels * frameSizeMs * sampleRate / 1000
pcm := make([]int16, frameSize)
dec, err := opus.NewDecoder(sampleRate, channels)<br>if err != nil {<br>return err
for {<br>rtpPacket, _, err := track.ReadRTP()<br>if err != nil {<br>return err<br>if rtpPacket == nil || len(rtpPacket.Payload) == 0 {<br>continue
n, err := dec.Decode(rtpPacket.Payload, pcm)<br>if err != nil {<br>continue
decoded := pcm[:n*channels]
buf := new(bytes.Buffer)<br>if err := binary.Write(buf, binary.LittleEndian, decoded); err != nil {<br>return err
if _, err := pipeWriter.Write(buf.Bytes()); err != nil {<br>return err
Two details should not be hidden:<br>The channels value must match the decoded stream assumptions. Do not hard-code stereo if the negotiated track is mono.<br>The FFmpeg input format must match the PCM buffer. int16 PCM maps to s16le; using s32le with int16 bytes is a bug to review before treating results as trustworthy.<br>Running FFmpeg arnndn<br>FFmpeg’s arnndn filter applies an RNN noise reduction model. For a validation file, the command shape is:<br>cmd := exec.Command(<br>"ffmpeg",<br>"-v", "warning",<br>"-f", "s16le",<br>"-ac", "2",<br>"-ar", "48000",<br>"-i", "pipe:0",<br>"-af", "arnndn=m=models/cb.rnnn",<br>"-c:a", "libopus",<br>"-b:a", "64k",<br>"output_rnn.opus",
The prototype repository currently shows the experimental shape, including the pipe to FFmpeg. Before using this in a production path, the raw audio format, channel count, frame duration, process lifecycle, and error handling all need to be made explicit.<br>Validation<br>For the first pass, I would validate with files:<br>Capture the unprocessed Opus output.<br>Decode or open the file in an audio tool such as Audacity.<br>Generate output_rnn.opus from the FFmpeg path.<br>Compare the perceived noise, waveform, and frequency content.<br>Check whether speech quality was damaged while noise was reduced.
This comparison should not be reduced to “the waveform looks cleaner”. Noise suppression can also remove weak speech details, create artifacts, or change the perceived naturalness of a call. Listening tests are still required.<br>Production Boundaries<br>The file-based prototype is not the same as a real-time WebRTC media server.<br>For real-time use, the design has to answer:<br>How much buffering is...