Why Live Video Compositing Still Requires a DevOps Team (And How to Fix That) | Fishjam blog From now on Fishjam supports MoQ protocol! Explore our products
Services<br>Pricing<br>Blog<br>Docs<br>Log inGet started
Why Live Video Compositing Still Requires a DevOps Team (And How to Fix That)<br>by Tomasz Mazur • May 20, 2026 • 7 min read
_ul,_&_li_>_ol]:mt_2 [&_li_>_ul,_&_li_>_ol]:mb_0 [&_ul_ul]:li-t_circle [&_a]:c_[#111827] [&_a]:td_underline [&_a]:tu-o_[3px] [&_strong]:fw_semibold [&_em]:font-style_italic [&_img]:w_full [&_img]:my_8 [&_:not(pre)_>_code]:ff_mono [&_:not(pre)_>_code]:fs_sm [&_:not(pre)_>_code]:bg-c_fishjam.yellow.20 [&_:not(pre)_>_code]:c_[#111827] [&_:not(pre)_>_code]:px_1.5 [&_:not(pre)_>_code]:py_0.5 [&_:not(pre)_>_code]:bdr_sm [&_:not(pre)_>_code]:bd_[1px_solid_token(colors.fishjam.orange.60)] [&_pre]:bg-c_[#22272e] [&_pre]:c_[#adbac7] [&_pre]:p_5 [&_pre]:bdr_lg [&_pre]:ov-x_auto [&_pre]:my_6 [&_pre]:ff_mono [&_pre]:fs_sm [&_pre]:lh_[1.6] [&_pre]:bd_[1px_solid_token(colors.gray.800)] [&_pre_code]:bg-c_transparent [&_pre_code]:p_0 [&_pre_code]:c_[inherit] [&_pre_code]:fs_sm [&_pre_code]:ff_mono [&_pre_code]:d_block [&_pre_code]:min-w_full [&_pre_code]:w_[fit-content] [&_blockquote]:bd-l-w_md [&_blockquote]:bd-l-c_fishjam.orange.60 [&_blockquote]:pl_5 [&_blockquote]:c_gray.600 [&_blockquote]:font-style_italic [&_blockquote]:my_5 [&_hr]:bd-t_[1px_solid_token(colors.gray.200)] [&_hr]:my_[48px] [&_>_*:last-child]:mb_0"> Scaling live video compositing infrastructure is harder than it looks – it breaks most of the assumptions that make ordinary backends easy to scale.
For the past few months I’ve been working on this at Software Mansion, and we ran into a stack of problems that I think most teams hit when they try to build it themselves: orchestration, scaling, recovery, the usual suspects. I want to walk through what we learned, but first, a bit of context.
What even is “live video compositing”?
Live video compositing is the process of combining multiple video streams into one in real time. Take N video streams: your web cam, your favorite streamer’s livestream and a very necessary AI avatar. We feed them to a magical black box and receive a single video stream as the output: you and your AI avatar friend watching your favorite streamer side-by-side.
In the real world, that “magical black box” is called server-side compositing. It’s the same heavy-lifting tech Google deployed for YouTube TV’s Multiview feature. Instead of forcing your phone or laptop to decode three intense video feeds at once, a cloud server (running something like FFmpeg) “sews” them together in real time, delivering a single, lightweight stream to your screen.
That’s video compositing. For it to be live, it has to not be dead it has to be fast enough, let’s say less than 1 second (though for some scenarios it might make sense to say less than 10 seconds).
Server-side vs. client-side video compositing
Most video compositing happens locally, on the client-side.
If you’ve ever used OBS, congrats – you’ve probably done client-side video compositing! It all happens on your machine, which is why you might’ve heard your fans spin up.
Client-side compositing is a sane default, but not every use case can – or should – run on consumer hardware. More demanding workloads, like high-quality broadcasts, need dedicated hardware. That’s where server-side compositing comes in. Other use cases benefit from it too: compositing a livestream from a video conferencing room on the server gives every viewer a unified feed and cuts bandwidth costs.
This post is about server-side compositing – because the only way to “scale” on the client is to buy a better graphics card.
Who even needs live video compositing?
I do! But in all seriousness, the main use cases are:
Live broadcasting – sports broadcasts, co-streaming, anything that needs dynamic overlays or combines multiple concurrent streams.
AI and generated content – newer territory, but those chatbots with live avatars need some form of video compositing under the hood.
Interactive livestreaming – game shows, auctions, watch parties, anything where viewers’ camera feeds get pulled into the broadcast in real time.
One thing you might notice about these use cases is that they can generally be associated with a rather large scale. Sports are everywhere, so is collaborative livestreaming. I know you wouldn’t believe me, but AI is also pretty popular right now.
Scaling live video compositing is hard
Now that we know what live video compositing is, how do we go about actually integrating it into our application?
Luckily for me, my colleagues at Software Mansion are building Smelter, a live compositing server with a declarative API. We could also reach for FFmpeg, GStreamer, or even OBS as the underlying compositing engine – but each comes with its own headaches.
We get a proof of concept working with Smelter rolling in a few days and it works like a charm! Now we just need to do the same thing,...