A Production-Ready Factor Library for Downsampling L2 Tick Data to Daily Signals

CrazyTomato2 pts0 comments

High-Frequency Data Holds More Alpha Than You Think — A Factor Library to Unlock It | by DolphinDB | MediumSitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

High-Frequency Data Holds More Alpha Than You Think — A Factor Library to Unlock It

DolphinDB

16 min read·<br>Mar 20, 2026

Listen

Share

In the field of quantitative investment research, factor discovery and application are evolving toward higher-frequency and more granular data. Factor libraries built on DolphinDB, such as 191 Alpha and WorldQuant 101 Alpha, provide a solid foundation for strategy development based on daily and minute-level market data. However, such low-frequency factors are inherently limited in capturing rapidly changing market microstructure dynamics and in extracting more time-sensitive and differentiated trading signals.<br>Press enter or click to view image in full size

As market data becomes increasingly granular, massive volumes of minute-level, snapshot-level, and even tick-by-tick data contain richer information regarding price formation, order flow dynamics, and participant behavior. Robustly extracting alpha signals from high-frequency data and downsampling them for lower-frequency (i.e., daily and hourly) strategy research and portfolio management is now a primary industry focus. Transforming high-frequency data into lower-frequency features enables strategy developers to convert micro-level, transient market states (such as capital flow direction, order book imbalance, and trading impact) into stable, predictive low-frequency features or factors. This facilitates earlier opportunity identification, improved risk management, and the development of differentiated strategies with informational advantages within longer trading horizons.<br>To address this need, this tutorial presents a professional factor computation solution for minute-level and tick-by-tick financial data, natively built on DolphinDB. The solution leverages DolphinDB’s superior data processing capabilities to adapt over 100 validated mid- to low-frequency factors from public research reports and academic literature to high-frequency data, including minute-level OHLC data, level-2 market snapshots, tick-by-tick orders, and tick-by-tick trades.<br>1. Introduction to High-Frequency-to-Low-Frequency Factor Library<br>High-frequency market data refers to data with a time granularity between daily frequency and ultra-high frequency (e.g., millisecond level), primarily including minute-level OHLC data, market snapshots, tick-by-tick trades, and tick-by-tick orders. These datasets record the most granular price movements, order flow dynamics, and trading activities in real time, forming a rich information source for capturing market microstructure and extracting distinctive alpha signals.<br>Based on established public research reports and academic literature, this tutorial systematically organizes and implements a high-frequency-to-low-frequency factor library. The library covers multiple categories of factors, including price–volume trend factors, volatility factors, and liquidity factors. Its core value lies in providing a fully engineered, performance-optimized, and preliminarily validated standardized factor computation framework. You can directly apply it to high-frequency data to efficiently generate factor series with higher information density, suitable for low-frequency strategy research.<br>The factor library is natively built on DolphinDB that integrates distributed computing, real-time stream processing, and efficient storage engines. Its multi-paradigm programming language and extensive financial analytics functions are well-suited to handling the substantial throughput and computational complexity of high-frequency data processing, enabling second-level generation of daily factors from terabyte-scale high-frequency datasets. This tutorial provides complete computation scripts and performance benchmarks to facilitate rapid validation, iteration, and deployment of customized low-frequency factors.<br>2. Dataset and Field Specifications<br>The factor library presented in this tutorial is built on four categories of market data from the Chinese A-share market: minute-level OHLC data, market snapshots, tick-by-tick orders, and tick-by-tick trades. This chapter outlines selected fields and the partitioning schema of the relevant datasets. For the detailed database and table schema design, as well as the code, see Best Practices for Financial Data Storage.<br>Press enter or click to view image in full size

2.1 Minute-Level OHLC Data<br>Minute-level OHLC data represents the intraday price trajectory at one-minute intervals. It is typically aggregated from tick-by-tick trades and is widely used by short-term traders for intraday price analysis. In this tutorial, the minute-level OHLC data is partitioned by date and stored using the OLAP engine. Each partition contains the minute-level OHLC data for all stocks on the corresponding trading day. The...

data frequency tick level factor high

Related Articles