Server-Side WebRTC Noise Reduction with Pion, FFmpeg, and RNN Models

Sean-Der1 pts0 comments

Server-Side WebRTC Noise Reduction with Pion, FFmpeg, and RNN Models |<br>SnowlygI'm Snowlyg<br>Engineering notes on Go backend, WebRTC, reliability, and production debugging.

Server-Side WebRTC Noise Reduction With Pion, FFmpeg, and RNN Models<br>Server-side audio noise reduction for WebRTC calls should start with a narrow validation target. The goal is not to claim that server-side filtering should replace WebRTC&rsquo;s built-in audio processing. The goal is to test whether a Go media service can receive an Opus track with Pion, decode it to PCM, run FFmpeg&rsquo;s RNN noise reduction filter, and produce an output that is worth evaluating.<br>The source experiment is based on a public sample project: snowlyg/webrtc_denoise_use_ffmpge. It is a prototype, not production-ready RTC infrastructure.<br>Background<br>WebRTC already includes audio processing blocks such as echo cancellation, noise suppression, and automatic gain control. In a controlled device environment, using the client-side WebRTC audio stack is usually the first and best option.<br>In field deployments, the device side is not always controlled:<br>microphones and speakers may vary across batches.<br>hardware acoustic design may change faster than the software release cycle.<br>vendor firmware may expose inconsistent audio behavior.<br>tuning each device family deeply can create a high maintenance cost.<br>That makes a server-side experiment useful. If a service can receive audio, apply a known filter, and compare the result offline, it becomes easier to decide whether server-side processing is a viable supplement for specific environments.<br>The first safe milestone is not real-time forwarding. It is file-based validation.<br>Processing Boundary<br>The server-side path is:

Browser or client sends a WebRTC audio track.<br>Pion receives the remote track in OnTrack.<br>The service reads RTP packets with track.ReadRTP().<br>Opus payload is decoded to PCM.<br>PCM is written to an FFmpeg process through stdin.<br>FFmpeg applies arnndn with an RNN model.<br>The filtered output is written to a file for comparison.<br>This boundary matters because RTP, Opus, PCM, and FFmpeg raw audio input are different formats. Mixing them up can produce a file, but not necessarily valid audio.<br>Minimal Experiment<br>The prototype starts from Pion&rsquo;s save-to-disk idea: connect a browser page to a Go process, receive audio/video, and write media out for inspection.<br>The public dependencies are:<br>Pion WebRTC<br>FFmpeg<br>hraban/opus<br>richardpl/arnndn-models<br>The prototype uses Pion v4 and an Opus decoder from gopkg.in/hraban/opus.v2.<br>Decoding Opus to PCM<br>The audio track is usually Opus at 48 kHz. The example code keeps the sample rate explicit and decodes the RTP payload into an int16 PCM buffer.<br>var sampleRate = 48000<br>var channels = 2<br>var frameSizeMs = 60<br>frameSize := channels * frameSizeMs * sampleRate / 1000

pcm := make([]int16, frameSize)

dec, err := opus.NewDecoder(sampleRate, channels)<br>if err != nil {<br>return err

for {<br>rtpPacket, _, err := track.ReadRTP()<br>if err != nil {<br>return err<br>if rtpPacket == nil || len(rtpPacket.Payload) == 0 {<br>continue

n, err := dec.Decode(rtpPacket.Payload, pcm)<br>if err != nil {<br>continue

decoded := pcm[:n*channels]

buf := new(bytes.Buffer)<br>if err := binary.Write(buf, binary.LittleEndian, decoded); err != nil {<br>return err

if _, err := pipeWriter.Write(buf.Bytes()); err != nil {<br>return err

Two details should not be hidden:<br>The channels value must match the decoded stream assumptions. Do not hard-code stereo if the negotiated track is mono.<br>The FFmpeg input format must match the PCM buffer. int16 PCM maps to s16le; using s32le with int16 bytes is a bug to review before treating results as trustworthy.<br>Running FFmpeg arnndn<br>FFmpeg&rsquo;s arnndn filter applies an RNN noise reduction model. For a validation file, the command shape is:<br>cmd := exec.Command(<br>"ffmpeg",<br>"-v", "warning",<br>"-f", "s16le",<br>"-ac", "2",<br>"-ar", "48000",<br>"-i", "pipe:0",<br>"-af", "arnndn=m=models/cb.rnnn",<br>"-c:a", "libopus",<br>"-b:a", "64k",<br>"output_rnn.opus",

The prototype repository currently shows the experimental shape, including the pipe to FFmpeg. Before using this in a production path, the raw audio format, channel count, frame duration, process lifecycle, and error handling all need to be made explicit.<br>Validation<br>For the first pass, I would validate with files:<br>Capture the unprocessed Opus output.<br>Decode or open the file in an audio tool such as Audacity.<br>Generate output_rnn.opus from the FFmpeg path.<br>Compare the perceived noise, waveform, and frequency content.<br>Check whether speech quality was damaged while noise was reduced.

This comparison should not be reduced to &ldquo;the waveform looks cleaner&rdquo;. Noise suppression can also remove weak speech details, create artifacts, or change the perceived naturalness of a call. Listening tests are still required.<br>Production Boundaries<br>The file-based prototype is not the same as a real-time WebRTC media server.<br>For real-time use, the design has to answer:<br>How much buffering is...

ffmpeg audio opus webrtc side noise

Related Articles