Engineering Always-On Market Data Infrastructure for Crypto Trading | by DolphinDB | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Engineering Always-On Market Data Infrastructure for Crypto Trading
DolphinDB
16 min read·<br>Feb 3, 2026
Listen
Share
Press enter or click to view image in full size
The cryptocurrency market never sleeps. With 24/7 trading across hundreds of exchanges, explosive data volumes, and volatility that can swing portfolios in seconds, the infrastructure challenge for quant firms isn’t just about storing data — it’s about capturing every tick, processing it in real time, and ensuring nothing gets lost along the way.<br>Traditional financial data pipelines weren’t built for this. Crypto generates order-of-magnitude more events than equity markets, exchanges can go offline without warning, and network interruptions are routine rather than exceptional. For quantitative trading firms, a missed data point during a liquidation cascade or a delayed snapshot during high volatility can mean the difference between alpha and loss.<br>This article walks through a production-grade architecture for ingesting, processing, and monitoring cryptocurrency market data at scale — covering everything from historical backfills to real-time streaming, fault tolerance, and operational monitoring.<br>1. The Data Challenge<br>Cryptocurrency market data comes in many forms: tick trades, order book snapshots, OHLC bars at multiple frequencies, funding rates, liquidation events, and more. Each type serves different purposes — strategy research needs historical depth, backtesting requires precise replay capabilities, and live trading demands sub-second latency with zero data loss.<br>The solution integrates two major exchanges — Binance and OKX — and supports a comprehensive range of data types:<br>High-frequency data : Level 2 order books (up to 400 levels), tick trades, aggregated trades<br>Time-series data : OHLC bars at 15+ frequencies, from 1-second to monthly<br>Market metadata : Funding rates, liquidation events, index/mark prices, contract specifications<br>Real-time streams : Continuous contract data, snapshot aggregations<br>All timestamp fields use Beijing Time (UTC+8) for consistency, and the system handles both USD-margined and coin-margined futures alongside spot markets.<br>For completeness, the Appendix provides a catalog of ingestion scripts covering both historical backfills and real-time subscriptions. Exchange-specific interfaces follow the official specifications (Binance Open Platform and OKX API Guide) published by Binance and OKX.<br>2. Database and Table Schema Design<br>Cryptocurrency market data is inherently heterogeneous. A single trading day can generate millions of order book updates while funding rates change only three times. This variance demands purpose-built storage strategies rather than a one-size-fits-all approach.<br>The platform employs multiple database engines, each optimized for specific data characteristics:<br>TSDB engines handle ultra-high-frequency streams — depth data, tick trades, and 400-level order book snapshots — where write throughput and time-ordered retrieval are paramount.<br>OLAP engines store minute-level OHLC bars, index prices, mark prices, and daily aggregates. These tables are written once but queried repeatedly for backtesting and analysis, making columnar compression and scan performance critical.<br>Dimension tables accommodate low-frequency reference data: funding rates, liquidation events, and contract specifications. Their infrequent updates and lookup-oriented access patterns suit simple partitioning schemes.<br>Partitioning Strategy<br>All databases employ time-based partitioning combined with symbol-based sub-partitioning where appropriate. This dual-axis design delivers three key benefits:<br>Parallel ingestion : Multiple instruments can write concurrently without contention<br>Query pruning : Historical analysis automatically skips irrelevant partitions<br>Horizontal scalability : Adding new trading pairs requires no schema changes<br>Sorting columns are chosen to align with common query patterns — typically exchange, symbol, and timestamp — accelerating both time-range scans and symbol-specific lookups.<br>The table below summarizes the database architecture. Complete schema definitions and creation scripts are provided in the Appendix.<br>Press enter or click to view image in full size
3. Historical Data Ingestion<br>The platform provides a complete set of ingestion pipelines for historical cryptocurrency market data, including multi-frequency OHLC bars, aggregated trades, tick trades, funding rates, and market metrics. Users can configure time ranges, asset lists, and bar frequencies depending on research or validation needs.<br>Historical data primarily supports backtesting, factor research, and daily data integrity checks. For consistency across downstream workflows, database and table naming conventions are fixed in the default scripts; modifying them requires...