Fixing WebRTC data-channels head-of-line blocking with RFC-8260 | PionFixing WebRTC data-channels head-of-line blocking with RFC-8260<br>Pion adds support for RFC-8260 (Stream Schedulers and User Message Interleaving) which fixes the sender-side head-of-line blocking<br>Jo Turk -- 2026-05-17
What is head-of-line blocking and why it’s an issue with WebRTC datachannels<br>SCTP (the protocol used by WebRTC data channels) looks like it should be perfect for multiplexed application data. It is message-based, it supports multiple streams inside one association, and WebRTC data channels are built on top of it for non-media data.<br>To understand sender-side head-of-line blocking, we first need to understand how SCTP sends messages. Messages are scheduled based on their send order. SCTP fragments messages according to the path MTU (Maximum Transmission Unit), which is usually around 1200 bytes for WebRTC. A 256 KB message would therefore be split into roughly 218 smaller fragments (chunks).
flowchart LR<br>AppBulk["App sends large message"] --> SCTP["Classic SCTP sender"]<br>AppControl["App sends tiny small message"] --> SCTP
SCTP --> Frag["Fragment large messageinto DATA chunks"]<br>Frag --> Queue["large fragments stay together"]<br>Queue --> Wire["Wire order"]
Wire --> F0["large frag 0"]<br>F0 --> F1["large frag 1"]<br>F1 --> F2["large frag 2"]<br>F2 --> FMore["... many more fragments ..."]<br>FMore --> tiny["tiny small message"]<br>So what is the actual problem?<br>Imagine you’re streaming media over SCTP for some special use case, maybe DRM protected-content or codecs that aren’t normally supported by WebRTC media tracks (RTP), such as AAC audio or VVC video. At the same time you have control, telemetry or chat streams sharing the same SCTP association.<br>If a media stream starts sending large reliable messages, SCTP will fragment them into smaller chunks, and those fragments can monopolize the send queue. In other words, small chat or control messages may end up waiting behind the large transfer.<br>If the stream is reliable and packets are lost, retransmissions of those large fragments can delay smaller streams even further. Chat messages will look delayed, and control traffic will feel laggy.<br>large video fragment 0<br>large video fragment 1<br>large video fragment 2<br>large video fragment 3<br>...<br>large video fragment 200<br>tiny control message finally sent<br>chat messages finally sent<br>audio-like frame finally sent
So why can’t I just fragment the messages myself and have my own queue?<br>Many applications do exactly that to avoid large SCTP messages, instead of sending one huge message, the application splits it into smaller chunks (just like SCTP itself), and schedules those chunks itself.<br>But the tradeoff is that you’re now effectively building another protocol on top of SCTP.<br>And you need to handle things like chunk numbering, reassembly, flow control, backpressure handling, retransmission semantics, fair scheduling, and message boundaries.<br>This gets trickier depending on the stream mode (pick your poison).<br>If you use partially reliable streams or unreliable streams (SCTP won’t retransmit lost packets), you need to decide what happens when fragments are dropped, should it be discarded, partially decoded, retransmitted, some sort of FEC?<br>But if you decide you’re not going to deal with a complex unreliable handling, and you use reliable streams, then you’re back dealing with delayed delivery on other streams under packet loss and retransmissions on bad networks.<br>RFC 8260 was written to fix exactly this class of problem. The issue was that TSN (Transmission Sequence Number), was doing too many jobs at once: reliability, fragment reassembly, and sequencing. fragmented messages also had to use consecutive TSNs.<br>What I-DATA changes<br>SCTP message interleaving uses the I-DATA chunk. The important change is that TSN is no longer used to order fragments inside a user message. I-DATA adds:<br>MID = Message Identifier<br>FSN = Fragment Sequence Number
The TSN still exists, though. It is still used for reliability, SACKs (Selective Acknowledgment), loss detection, and retransmission. But fragment reassembly now uses MID + FSN , not “all fragments must be adjacent in TSN space.” RFC 8260 says I-DATA adds MID and FSN, removes SSN, uses MID to identify the message, and uses FSN to enumerate fragments of that message.<br>So the identity becomes:<br>TSN = reliability / SACK / retransmission<br>SID = SCTP stream<br>MID = user message inside that stream<br>FSN = fragment number inside that message
Example:<br>SID=4 MID=90 FSN=0 TSN=100 video fragment<br>SID=2 MID=301 FSN=0 TSN=101 chat message<br>SID=0 MID=12 FSN=0 TSN=102 control message<br>SID=4 MID=90 FSN=1 TSN=103 video fragment<br>SID=1 MID=44 FSN=0 TSN=104 chat message<br>SID=4 MID=90 FSN=2 TSN=105 video/ fragment
The video fragments still reassemble correctly because the fragments are identified by:<br>SID=4 + MID=90 + FSN=0,1,2,...
They no longer need to occupy one contiguous TSN range :)<br>Interleaving...