We Use One Data Pipeline for Research and Live Trading

Polly_Liu2 pts0 comments

From Factor Discovery to Live Signals: Unified Stream-Batch Processing for Crypto Trading | by DolphinDB | MediumSitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

From Factor Discovery to Live Signals: Unified Stream-Batch Processing for Crypto Trading

DolphinDB

11 min read·<br>Feb 5, 2026

Listen

Share

Press enter or click to view image in full size

In quantitative trading, factor discovery is the foundation of alpha generation. Whether for high-frequency crypto strategies or medium-term systematic portfolios, the ability to efficiently compute, iterate, and deploy factors directly determines research velocity and production readiness.<br>Traditional factor research workflows usually fall into two categories:<br>Manual factor discovery , relying on researchers’ market intuition and financial expertise. While often insightful, this approach is time-consuming, difficult to scale, and limited by human cognition.<br>Algorithm-driven discovery , which applies machine learning and deep learning models to automatically extract predictive patterns from market data. Neural networks, in particular, excel at capturing nonlinear relationships among price dynamics, order flow, sentiment, and macro-level variables — making them especially suitable for the highly volatile and information-dense cryptocurrency market.<br>However, machine-learning-based factor mining imposes extremely demanding infrastructure requirements:<br>Fast access to massive historical tick or minute-level data<br>Efficient batch computation for training datasets<br>Low-latency streaming computation for backtesting and simulation<br>Seamless integration between databases and Python-based modeling frameworks<br>This is precisely where DolphinDB’s unified stream-batch architecture becomes valuable.<br>In this article, we present a complete crypto factor research and deployment pipeline built on DolphinDB, covering:<br>Minute-level factor storage design<br>Batch and streaming factor computation<br>Machine-learning-driven backtesting and real-time signal generation<br>The solution enables researchers to iterate factors offline at scale while deploying the same logic in real-time simulations — without rewriting infrastructure.<br>1. Designing a Minute-Level Factor Database<br>1.1 Storage Engine and Partitioning Strategy<br>Minute-level factor data exhibits three key characteristics:<br>Very large volume<br>Continuous appends<br>Frequent time-range queries by symbol, market, and factor<br>To support these workloads, we recommend using DolphinDB’s TSDB storage engine together with a narrow-table schema .<br>The database design is summarized below (the full script is provided in the Appendix):<br>Press enter or click to view image in full size

Note: If the estimated number of combinations of market and asset exceeds 1,000, it is recommended to reduce the dimension of the sort columns during database creation by adding the following parameter: sortKeyMappingFunction=[hashBucket{,3}, hashBucket{,300}].<br>1.2 Table Schema<br>The column definitions of the minute-level factor table (factor_1m) are as follows.<br>Press enter or click to view image in full size

This narrow format makes it easy to add new factors without schema changes and supports both historical analysis and real-time ingestion.<br>2. Factor Computation<br>DolphinDB ships with built-in GTJA 191 and WorldQuant 101 alpha libraries. Since most cross-sectional and fundamental-based factors are not applicable to crypto markets, we extract time-series-based factors and convert them into state functions for streaming use.<br>These state-aware factors are defined in stateFactors.dos. You can load the module using use or extend it with user-defined factor definitions. The complete implementation is provided in the Appendix.<br>Using the gtjaAlpha100 factor as an example, we demonstrate how to perform both batch and streaming computation for the same factor in DolphinDB. In stateFactors.dos, gtjaAlpha100 is defined as:<br>@state<br>def gtjaAlpha100(vol){<br>return mstd(vol, 20)<br>}Compared with its definition in gtja191Alpha.dos, the only difference is the addition of @state, which converts it from a regular function into a state function, enabling direct use in streaming computation.<br>2.1 Batch Factor Computation<br>Factors are computed based on minute-level OHLC data, so historical minute-level OHLC data must exist in the DolphinDB server.<br>Batch computation consists of three steps: retrieving historical data, generating factor computation expressions, and computing and writing results to the database. The full script is provided in the Appendix.<br>Retrieve historical data from the database.<br>// Use the factor module<br>use stateFactors<br>go<br>all_data = select * from loadTable("dfs://CryptocurrencyKLine","minKLine")<br>factor_value_tb = loadTable("dfs://CryptocurrencyFactor", "factor_1min")2. Generate factor computation expressions.<br>cols_dict = {<br>"vol": "volume"<br>def dict_replace_str(s, str_dict){<br>for (i in str_dict.keys()){<br>result = regexReplace(s, i, str_dict[i])<br>return...

factor computation data batch dolphindb level

Related Articles