Building a High-Throughput FIX Server - Akin Ocal
Akin Ocal
SubscribeSign in
Building a High-Throughput FIX Server<br>From Single-Core Efficiency to Multi-Core Scaling<br>Akin Ocal<br>Jun 15, 2026
Share
Introduction<br>In this article, we explore what it takes to reach 600K+ FIX messages per second throughput in a FIX server. This throughput was measured on a 16-core server, and the same design can scale beyond one million messages per second on hosts with more CPU cores.<br>For a quick introduction to the FIX protocol, see my earlier article:<br>https://akinocal1.substack.com/i/193450018/basics-of-fix-and-what-we-are-measuring<br>For a FIX server, this starts with an optimised receive path that performs well on a single CPU core. Once the single-core path is efficient, the next step is scaling across multiple cores while minimising contention.<br>The following sections first present the benchmark results, then explain the implementation techniques used to achieve them.<br>The benchmarks and techniques presented in this article are based on llfix, a low-latency C++ FIX engine, available as both an open-source edition (https://github.com/CorewareLtd/llfix) and a commercial edition (www.llfix.net).<br>Benchmarks<br>The benchmark server details are as below :
Hwloc/lstopo output of the benchmark server is as below :
As for tunings, CPU frequency was maximised and hyperthreading was disabled.<br>In this benchmark, the FIX server and FIX client applications run on the same host and communicate over the loopback device. This setup is used to minimise the effect of external networking and packet losses so the benchmark focuses mainly on the server-side FIX receive path.<br>The client application starts multiple FIX clients, all connected to the FIX server application running the FIX acceptor. Each FIX client sends 150,000 messages. Also at the same time, the FIX server sends execution reports back to all connected FIX clients.<br>Throughput is measured using RDTSCP timestamps recorded by the FIX server during memory-mapped-file-based message serialisation. The Python script used to calculate throughput is available here:<br>https://github.com/CorewareLtd/llfix/blob/main/benchmarks/networked_server_rx/calculate_throughput.py<br>The measured path includes:<br>Receiving and parsing the incoming FIX message
Session-level validations, including fundamental FIX session checks
FIX dictionary validations, used dictionaries:<br>- https://github.com/CorewareLtd/llfix/blob/main/tests/dictionaries/FIX50SP2.xml<br>- https://github.com/CorewareLtd/llfix/blob/main/tests/dictionaries/FIXT11.xml
Stale timestamp validation
Message serialisation to the file system
The message sent from clients to the server is as below :<br>Message :<br>8=FIXT.1.1|9=188|35=D|34=2|<br>49=CLIENT1|52=20251231-17:42:03.736004873|<br>56=EXECUTOR|11=1|55=BMWG.DE|<br>54=1|38=1|44=5|40=2|59=0|<br>453=2|<br>448=PARTY1|447=D|452=1|<br>448=PARTY2|447=D|452=3|<br>60=20251231-17:42:03.736004873|<br>10=077|<br>Results:
RX Latency<br>This section focuses on receive-side latency because the cost of receiving, parsing, validating, and dispatching incoming FIX messages directly affects overall server throughput.<br>Reactor IO pattern: llfix uses a reactor-style design based on readiness-based I/O multiplexing. On Linux, this is implemented with epoll.<br>Instead of dedicating a blocking synchronous receive path to each socket, the server can monitor multiple sockets and process the ones that are ready for reading. For a FIX acceptor handling many client sessions, this avoids unnecessary blocking and scales better than a simple synchronous socket-per-wait model.<br>The diagram below illustrates the case without epoll :
As for the case with epoll :
Why epoll ? : epoll was chosen for its maturity, broad kernel support, and predictable behaviour across production Linux environments. io_uring is a promising interface, but its availability varies across kernel versions and some environments restrict it.<br>That said, the TCP reactor in llfix is template-based, making an io_uring-backed reactor a natural future experiment.<br>No data copies: llfix uses llfix::FixStringView, which is conceptually similar to std::string_view: it stores a pointer and a length.<br>Source : https://github.com/CorewareLtd/llfix/blob/main/include/llfix/fix_string_view.h<br>During RX processing, these views point directly into the network I/O buffers. This allows llfix to reference incoming FIX field values without copying them into separate strings.<br>No memory allocations: llfix uses a single incoming message instance per session.<br>That message instance uses a memory pool to store llfix::FixStringView objects representing incoming FIX field values. This avoids dynamic memory allocation during normal RX message processing.<br>Avoiding allocations on the hot path helps both latency and determinism, since allocator calls can introduce unpredictable tail latency.<br>Message type compression: A FIX engine needs to perform per-message-type processing during RX. Examples include validating incoming message...