Boundaries of Stationary Feature Learning: A Minimax Barrier for Scaling Laws

ivandrozdovisme1 pts0 comments

β₀ for over-aligned ν. (iii) H1 and H2 as objects, not assumptions. I derive the capacity penalty Σλₖᵛ as the rich-regime implicit bias of a depth-L diagonal/deep-linear network, with ν = 1/L, and identify the load-bearing value ν = d*/(2s) as the depth–smoothness matching at which the attractor saturates the barrier. The correct approximation exponent is α = 2s/d* and α > β₀ holds unconditionally. (iv) Where deviations come from. Any empirical β > β₀ must originate outside stationary Sobolev learning: from reduced effective dimension d_loc ≪ d (compositional/Besov data) or from transient non-stationary kernel alignment. This complements the dynamical feature-learning models of Bordelon, Atanasov and Pehlevan." />

β₀ for over-aligned ν. (iii) H1 and H2 as objects, not assumptions. I derive the capacity penalty Σλₖᵛ as the rich-regime implicit bias of a depth-L diagonal/deep-linear network, with ν = 1/L, and identify the load-bearing value ν = d*/(2s) as the depth–smoothness matching at which the attractor saturates the barrier. The correct approximation exponent is α = 2s/d* and α > β₀ holds unconditionally. (iv) Where deviations come from. Any empirical β > β₀ must originate outside stationary Sobolev learning: from reduced effective dimension d_loc ≪ d (compositional/Besov data) or from transient non-stationary kernel alignment. This complements the dynamical feature-learning models of Bordelon, Atanasov and Pehlevan." />

β₀ for over-aligned ν. (iii) H1 and H2 as objects, not assumptions. I derive the capacity penalty Σλₖᵛ as the rich-regime implicit bias of a depth-L diagonal/deep-linear network, with ν = 1/L, and identify the load-bearing value ν = d*/(2s) as the depth–smoothness matching at which the attractor saturates the barrier. The correct approximation exponent is α = 2s/d* and α > β₀ holds unconditionally. (iv) Where deviations come from. Any empirical β > β₀ must originate outside stationary Sobolev learning: from reduced effective dimension d_loc ≪ d (compositional/Besov data) or from transient non-stationary kernel alignment. This complements the dynamical feature-learning models of Bordelon, Atanasov and Pehlevan." />

Boundaries of Stationary Feature Learning: A Minimax Barrier for Scaling Laws and the Necessity of Compositional Structure

Skip to main

You are using an outdated browser. Please upgrade your browser to improve your experience.

Published June 2, 2026

| Version v1

Preprint

Open

Boundaries of Stationary Feature Learning: A Minimax Barrier for Scaling Laws and the Necessity of Compositional Structure

Authors/Creators

drozdov, ivan

Description

I do not derive the Chinchilla scaling law; I map the boundaries of the regime in which such a derivation could even be attempted. Working in the μP feature-learning setting on a Sobolev-on-manifold data model, I establish what the stationary limit of feature learning can and cannot do.

(i) A barrier. The classical Sobolev minimax lower bound makes β₀ = 2s/(2s+d*) an unconditional ceiling for any estimator from D samples; feature learning is a special case, so no stationary first-order method can exceed it.

(ii) Self-organised criticality. Treating the target's intrinsic smoothness as a free parameter t, the variational attractor realises source exponent r(ν) = t(ν+1)/(1+2t) relative to its own kernel; this is monotone in the capacity exponent ν, equals exactly r = 1/2 at ν = 1/(2t), and the barrier forbids the corresponding β > β₀ for over-aligned ν.

(iii) H1 and H2 as objects, not assumptions. I derive the capacity penalty Σλₖᵛ as the rich-regime implicit bias of a depth-L diagonal/deep-linear network, with ν = 1/L, and identify the load-bearing value ν = d*/(2s) as the depth–smoothness matching at which the attractor saturates the barrier. The correct approximation exponent is α = 2s/d* and α > β₀ holds unconditionally.

(iv) Where deviations come from. Any empirical β > β₀ must originate outside stationary Sobolev learning: from reduced effective dimension d_loc ≪ d (compositional/Besov data) or from transient non-stationary kernel alignment. This complements the dynamical feature-learning models of Bordelon, Atanasov and Pehlevan.

Files

chinchilla.pdf

Files<br>(650.2 kB)

Name<br>Size

Download all

chinchilla.pdf

md5:b09e3350c7c31690a3a92ad8b05b3220

650.2 kB

Preview

Download

Views

Downloads

Show more details

All versions<br>This version

Views

Total views

Downloads

Total downloads

Data volume

Total data volume

0 Bytes<br>0 Bytes

More info on how stats are collected....

Versions

External resources

Indexed in

OpenAIRE

Communities

Details

DOI

DOI Badge

DOI

10.5281/zenodo.20516952

Markdown

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20516952.svg)](https://doi.org/10.5281/zenodo.20516952)

reStructuredText

.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.20516952.svg<br>:target:...

learning stationary from feature barrier depth

Related Articles