Regent: A Differentiable Graph-Based Quant Trading Pipeline (VQ-VAE and GNNs)

REGENT: A Regime-Guided Equity Neural Trading System | Szalay Péter

REGENT: A Regime-Guided Equity Neural Trading System with Differentiable Graph-Based Portfolio Optimization

Szalay Péter

Independent Research · June 2026

Paper

Code

Disclaimer. This work and the accompanying software are for research and educational purposes only. They are not investment advice and not a recommendation to trade any security. All performance figures are historical backtests and walk-forward simulations on past data; simulated results have inherent limitations and do not represent actual trading. Past performance does not predict future results , and future performance may differ materially, including total loss of capital. Any use of this work is entirely at the reader's own risk.

Abstract

I present REGENT (Regime-guided Equity Neural Trading System), an end-to-end research pipeline for daily, long-only US equity portfolio management. Traditional portfolio optimisation pipelines decouple alpha generation, cross-sectional ranking, and portfolio sizing, leading to misaligned objectives. REGENT addresses this by coupling a temporal-transformer Vector-Quantised Variational Autoencoder (VQ-VAE) for macroeconomic regime classification with GRASP (Graph-based Risk-Aware Spatio-temporal Portfolio Agent). GRASP is a four-module differentiable graph neural network that maps per-stock OHLCV and cross-sectional features directly to constrained portfolio weights. The system enforces strict allocation constraints (e.g., long-only, sector bounds, gross exposure limits) natively via a differentiable convex quadratic-program (QP) projection layer, eliminating the need for soft penalty tuning. The entire architecture is trained end-to-end against a composite financial objective that combines Sharpe, Sortino, Conditional Value-at-Risk (CVaR0.08), turnover, and diversification metrics, with no reinforcement-learning reward shaping. I detail the system's mathematical formulations, a walk-forward evaluation protocol with a purged K-fold cross-validation ensemble per window, and an execution-fidelity live trading research loop. Across an eight-window non-overlapping walk-forward (2022–2026) the deployed ensemble achieves a mean test Sharpe of 1.08 (median 1.07), a median Calmar of 2.31, and a positive Sharpe in 8/8 windows; chained across all eight test slices the out-of-sample equity compounds to +93.2% against +61.0% for SPY and +83.6% for an equal-weight book of the same 120-stock universe.

Walk-Forward Results (2022–2026)

The load-bearing result is an eight-window, non-overlapping walk-forward: each window retrains a fresh purged K=4-fold ensemble on its own train/validation span and backtests on the following embargoed 126-day test slice, so the eight test slices tile 2022-02 through 2026-02 with zero overlap and no window-specific tuning. The ensemble posts a positive Sharpe in every window.

1.08

Mean test Sharpe (median 1.07)

2.31

Median Calmar (clears the 2.0 bar)

8/8

Windows with positive Sharpe

+93.2%

Chained OOS return (SPY +61.0%)

Walk-forward aggregate over the eight test windows — mean / median: Sharpe +1.08 / +1.07, Sortino +1.60 / +1.53, Calmar +2.45 / +2.31, annualised return +18.22% / +16.44%, maximum drawdown −9.47% / −8.28%, win rate 55.0% / 56.0%. A sign test on the eight outcomes (8/8 positive) gives p = 0.0039 under the no-skill null; a Newey–West HAC correction on the pooled daily series gives t = 2.40 (p ≈ 0.016, two-sided).

Diagnostics & Empirical Results

VQ-VAE regime-encoder diagnostics: all twelve codewords active (perplexity 10.6, entropy 2.36 bits), the causal post-processed regime timeline overlaid on SPY, the reconstruction-error tail concentrated in macro dislocations, and per-regime macro fingerprints (VIX/TNX/DXY/SPY/GLD).

GRASP portfolio-agent diagnostics: realised equity versus the equal-weight benchmark, concentration and turnover traces, sector allocation over time, the learned GATv2 attention topology, and per-member ensemble health.

Single-window out-of-sample equity curve (2026-01-02 to 2026-06-10) versus SPY buy-and-hold and the EW-120 equal-weight book. Green shading marks REGENT's lead over SPY; red shading SPY's lead over REGENT.

Rolling 63-day Sharpe over the out-of-sample window. The metric stays in the strategy-acceptable band (> +1.0) for the majority of the period, softening into June as the SPY rebound narrows the excess.

Out-of-sample drawdown profile over the test window.

System Overview

REGENT abandons the classic three-stage decomposition (forecasting → ranking → sizing). The full pipeline, from per-stock indicators and macro state to constrained portfolio weights, is differentiable end-to-end, and the loss is a direct convex combination of Sharpe, Sortino, CVaR, drawdown, turnover, diversification, and graph-entropy terms computed on the realised return path.

Macro-state encoder. A temporal-transformer VQ-VAE encodes five macro series (VIX, TNX, DXY,...

Regent: A Differentiable Graph-Based Quant Trading Pipeline (VQ-VAE and GNNs)

Related Articles

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars

Italy's Meloni says Trump 'made up' story that she 'begged' him for photo at G7