β₀ for over-aligned ν. (iii) H1 and H2 as objects, not assumptions. I derive the capacity penalty Σλₖᵛ as the rich-regime implicit bias of a depth-L diagonal/deep-linear network, with ν = 1/L, and identify the load-bearing value ν = d*/(2s) as the depth–smoothness matching at which the attractor saturates the barrier. The correct approximation exponent is α = 2s/d* and α > β₀ holds unconditionally. (iv) Where deviations come from. Any empirical β > β₀ must originate outside stationary Sobolev learning: from reduced effective dimension d_loc ≪ d (compositional/Besov data) or from transient non-stationary kernel alignment. This complements the dynamical feature-learning models of Bordelon, Atanasov and Pehlevan." />
β₀ for over-aligned ν. (iii) H1 and H2 as objects, not assumptions. I derive the capacity penalty Σλₖᵛ as the rich-regime implicit bias of a depth-L diagonal/deep-linear network, with ν = 1/L, and identify the load-bearing value ν = d*/(2s) as the depth–smoothness matching at which the attractor saturates the barrier. The correct approximation exponent is α = 2s/d* and α > β₀ holds unconditionally. (iv) Where deviations come from. Any empirical β > β₀ must originate outside stationary Sobolev learning: from reduced effective dimension d_loc ≪ d (compositional/Besov data) or from transient non-stationary kernel alignment. This complements the dynamical feature-learning models of Bordelon, Atanasov and Pehlevan." />
β₀ for over-aligned ν. (iii) H1 and H2 as objects, not assumptions. I derive the capacity penalty Σλₖᵛ as the rich-regime implicit bias of a depth-L diagonal/deep-linear network, with ν = 1/L, and identify the load-bearing value ν = d*/(2s) as the depth–smoothness matching at which the attractor saturates the barrier. The correct approximation exponent is α = 2s/d* and α > β₀ holds unconditionally. (iv) Where deviations come from. Any empirical β > β₀ must originate outside stationary Sobolev learning: from reduced effective dimension d_loc ≪ d (compositional/Besov data) or from transient non-stationary kernel alignment. This complements the dynamical feature-learning models of Bordelon, Atanasov and Pehlevan." />
Boundaries of Stationary Feature Learning: A Minimax Barrier for Scaling Laws and the Necessity of Compositional Structure
Skip to main
You are using an outdated browser. Please upgrade your browser to improve your experience.
Published June 2, 2026
| Version v1
Preprint
Open
Boundaries of Stationary Feature Learning: A Minimax Barrier for Scaling Laws and the Necessity of Compositional Structure
Authors/Creators
drozdov, ivan
Description
I do not derive the Chinchilla scaling law; I map the boundaries of the regime in which such a derivation could even be attempted. Working in the μP feature-learning setting on a Sobolev-on-manifold data model, I establish what the stationary limit of feature learning can and cannot do.
(i) A barrier. The classical Sobolev minimax lower bound makes β₀ = 2s/(2s+d*) an unconditional ceiling for any estimator from D samples; feature learning is a special case, so no stationary first-order method can exceed it.
(ii) Self-organised criticality. Treating the target's intrinsic smoothness as a free parameter t, the variational attractor realises source exponent r(ν) = t(ν+1)/(1+2t) relative to its own kernel; this is monotone in the capacity exponent ν, equals exactly r = 1/2 at ν = 1/(2t), and the barrier forbids the corresponding β > β₀ for over-aligned ν.
(iii) H1 and H2 as objects, not assumptions. I derive the capacity penalty Σλₖᵛ as the rich-regime implicit bias of a depth-L diagonal/deep-linear network, with ν = 1/L, and identify the load-bearing value ν = d*/(2s) as the depth–smoothness matching at which the attractor saturates the barrier. The correct approximation exponent is α = 2s/d* and α > β₀ holds unconditionally.
(iv) Where deviations come from. Any empirical β > β₀ must originate outside stationary Sobolev learning: from reduced effective dimension d_loc ≪ d (compositional/Besov data) or from transient non-stationary kernel alignment. This complements the dynamical feature-learning models of Bordelon, Atanasov and Pehlevan.
Files
chinchilla.pdf
Files<br>(650.2 kB)
Name<br>Size
Download all
chinchilla.pdf
md5:b09e3350c7c31690a3a92ad8b05b3220
650.2 kB
Preview
Download
Views
Downloads
Show more details
All versions<br>This version
Views
Total views
Downloads
Total downloads
Data volume
Total data volume
0 Bytes<br>0 Bytes
More info on how stats are collected....
Versions
External resources
Indexed in
OpenAIRE
Communities
Details
DOI
DOI Badge
DOI
10.5281/zenodo.20516952
Markdown
[](https://doi.org/10.5281/zenodo.20516952)
reStructuredText
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.20516952.svg<br>:target:...