REGENT: A Regime-Guided Equity Neural Trading System | Szalay Péter
REGENT: A Regime-Guided Equity Neural Trading System with Differentiable Graph-Based Portfolio Optimization
Szalay Péter
Independent Research · June 2026
Paper
Code
Disclaimer. This work and the accompanying software are for research and educational<br>purposes only. They are not investment advice and not a recommendation to trade any<br>security. All performance figures are historical backtests and walk-forward simulations on past data;<br>simulated results have inherent limitations and do not represent actual trading. Past performance<br>does not predict future results , and future performance may differ materially, including total<br>loss of capital. Any use of this work is entirely at the reader's own risk.
Abstract
I present REGENT (Regime-guided Equity Neural Trading System), an end-to-end<br>research pipeline for daily, long-only US equity portfolio management. Traditional portfolio<br>optimisation pipelines decouple alpha generation, cross-sectional ranking, and portfolio sizing,<br>leading to misaligned objectives. REGENT addresses this by coupling a temporal-transformer<br>Vector-Quantised Variational Autoencoder (VQ-VAE) for macroeconomic regime classification with<br>GRASP (Graph-based Risk-Aware Spatio-temporal Portfolio Agent). GRASP is a<br>four-module differentiable graph neural network that maps per-stock OHLCV and cross-sectional<br>features directly to constrained portfolio weights. The system enforces strict allocation<br>constraints (e.g., long-only, sector bounds, gross exposure limits) natively via a differentiable<br>convex quadratic-program (QP) projection layer, eliminating the need for soft penalty tuning. The<br>entire architecture is trained end-to-end against a composite financial objective that combines<br>Sharpe, Sortino, Conditional Value-at-Risk (CVaR0.08), turnover, and diversification<br>metrics, with no reinforcement-learning reward shaping. I detail the system's mathematical<br>formulations, a walk-forward evaluation protocol with a purged K-fold cross-validation<br>ensemble per window, and an execution-fidelity live trading research loop. Across an eight-window<br>non-overlapping walk-forward (2022–2026) the deployed ensemble achieves a mean test Sharpe of<br>1.08 (median 1.07), a median Calmar of 2.31, and a positive Sharpe in 8/8 windows; chained across<br>all eight test slices the out-of-sample equity compounds to +93.2% against<br>+61.0% for SPY and +83.6% for an equal-weight book of the same 120-stock universe.
Walk-Forward Results (2022–2026)
The load-bearing result is an eight-window, non-overlapping walk-forward: each window retrains a<br>fresh purged K=4-fold ensemble on its own train/validation span and backtests on the<br>following embargoed 126-day test slice, so the eight test slices tile 2022-02 through 2026-02 with<br>zero overlap and no window-specific tuning. The ensemble posts a positive Sharpe in every window.
1.08
Mean test Sharpe<br>(median 1.07)
2.31
Median Calmar<br>(clears the 2.0 bar)
8/8
Windows with<br>positive Sharpe
+93.2%
Chained OOS return<br>(SPY +61.0%)
Walk-forward aggregate over the eight test windows — mean / median: Sharpe +1.08 / +1.07,<br>Sortino +1.60 / +1.53, Calmar +2.45 / +2.31, annualised return +18.22% / +16.44%,<br>maximum drawdown −9.47% / −8.28%, win rate 55.0% / 56.0%. A sign test on the eight outcomes<br>(8/8 positive) gives p = 0.0039 under the no-skill null; a Newey–West HAC correction on the<br>pooled daily series gives t = 2.40 (p ≈ 0.016, two-sided).
Diagnostics & Empirical Results
VQ-VAE regime-encoder diagnostics: all twelve codewords active (perplexity 10.6, entropy 2.36 bits), the causal post-processed regime timeline overlaid on SPY, the reconstruction-error tail concentrated in macro dislocations, and per-regime macro fingerprints (VIX/TNX/DXY/SPY/GLD).
GRASP portfolio-agent diagnostics: realised equity versus the equal-weight benchmark, concentration and turnover traces, sector allocation over time, the learned GATv2 attention topology, and per-member ensemble health.
Single-window out-of-sample equity curve (2026-01-02 to 2026-06-10) versus SPY buy-and-hold and the EW-120 equal-weight book. Green shading marks REGENT's lead over SPY; red shading SPY's lead over REGENT.
Rolling 63-day Sharpe over the out-of-sample window. The metric stays in the strategy-acceptable band (> +1.0) for the majority of the period, softening into June as the SPY rebound narrows the excess.
Out-of-sample drawdown profile over the test window.
System Overview
REGENT abandons the classic three-stage decomposition (forecasting → ranking → sizing). The full<br>pipeline, from per-stock indicators and macro state to constrained portfolio weights, is<br>differentiable end-to-end, and the loss is a direct convex combination of Sharpe, Sortino, CVaR,<br>drawdown, turnover, diversification, and graph-entropy terms computed on the realised return path.
Macro-state encoder. A temporal-transformer VQ-VAE encodes five macro series<br>(VIX, TNX, DXY,...