How a Microsecond-Level Low-Latency Engine Works

CrazyTomato2 pts0 comments

C++ Speed Without C++ Pain: Inside a Microsecond-Level Low-Latency Engine | by DolphinDB | MediumSitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

C++ Speed Without C++ Pain: Inside a Microsecond-Level Low-Latency Engine

DolphinDB

7 min read·<br>Dec 22, 2025

Listen

Share

Press enter or click to view image in full size

In ultra-latency-sensitive domains like quantitative trading and high-frequency data processing, every microsecond matters. On the evening of December 18, the “DolphinDB Core Technology Uncovered” livestream series returned with a deep dive into microsecond-level low-latency system optimization. The session delivered an in-depth technical breakdown of how DolphinDB’s scripting language achieves C++-level performance — a technical masterclass for engineers and industry practitioners.<br>The Core Challenges of Microsecond-Level Latency<br>In low-latency computing environments such as high-frequency trading and real-time market data processing, response times are now measured in microseconds or even nanoseconds. Traditional approaches typically face three critical challenges:<br>Prohibitively High Technical Barriers: Developers need deep expertise in CPU architecture, compiler optimizations, and often assembly-level programming.<br>Implementation Complexity: Systems must eliminate memory allocation overhead, avoid context switches, minimize CPU cache misses, and control latency jitter — all simultaneously.<br>Poor Development Efficiency: Strategy research is typically prototyped in scripting languages like Python for speed, but production deployment requires a complete rewrite in C++. This not only extends development timelines but creates inconsistencies between backtesting and live trading behavior.<br>Press enter or click to view image in full size

Swordfish: DolphinDB’s Low-Latency Solution<br>To overcome these challenges, DolphinDB developed Swordfish, a low-latency streaming data processing system. Its core advantages include:<br>Low Technical Barrier : Developers can focus purely on strategy logic using DLang, DolphinDB’s proprietary scripting language. Without managing CPU architecture details or compiler internals, they achieve microsecond-level real-time performance comparable to C++.<br>Seamless Strategy Deployment : The same codebase spans from strategy research through production deployment. This eliminates the need for strategy “translation” during go-live, ensuring behavioral consistency between backtests and live trading while dramatically shortening development cycles.<br>High Development Productivity : Built-in components such as order book synthesis engines and reactive streaming frameworks enable developers to rapidly implement core functionality like order aggregation — without building fundamental infrastructure from scratch.<br>Flexible Integration and Deployment : Swordfish can be embedded as a third-party library into existing systems like trading gateways, supporting diverse deployment architectures and significantly reducing migration costs.<br>Core Logic and Practical Implementation of DolphinDB Low-Latency Optimization<br>Through concrete technical implementations, code examples, and performance benchmarks, we systematically analyze Swordfish’s architecture across two key dimensions — low-level design optimization and scripting engine optimization — revealing how a scripting language can truly deliver C++-level performance.<br>I. Low-Level Design Optimization: Eliminating Latency at the Source<br>Low-latency performance is fundamentally determined by system design. Swordfish addresses latency bottlenecks at their root through several core architectural decisions, eliminating the inherent overheads of traditional approaches.<br>Row-Oriented Data Structures<br>Traditional columnar engines are poorly suited for real-time processing of individual or small batches of records. Swordfish adopts a row-oriented data layout, packing multiple fields of a single record into 64-byte cache-line–aligned runtime tuples. For example, a tuple containing id, value, flag, and timestamp is structured to perfectly align with CPU cache lines. This design dramatically reduces cache misses and delivers memory access performance on par with native C++ structs.<br>Custom Memory Pools to Eliminate Dynamic Allocation<br>Memory allocation is a hidden performance killer in low-latency systems, often introducing unpredictable jitter. Swordfish eliminates this instability through three strategies:<br>No allocation or deallocation in the critical path<br>Pre-allocation with controlled expansion<br>Custom PMR-based memory pools<br>By completely avoiding runtime memory requests to the operating system during main execution flows, Swordfish eliminates latency fluctuations at the source.

Cache-Friendly Data Structures<br>Order book engines rely heavily on ordered mappings and frequent traversal operations. Rather than using red-black tree–based map structures with scattered memory layouts, Swordfish employs flat_map, which stores...

latency level swordfish dolphindb microsecond performance

Related Articles