Why Good Engineers Become Worse With AI<br>IndexThe 10x engineer is regressing towards the mean.<br>Francis Galton named this effect in 1886 [1], when he noticed that exceptionally tall parents had children closer to average. LLMs are regression machines by construction. The decoding step samples the most-probable continuation of your prompt. That's the mean of the training distribution, conditioned on what you typed.
Regression to the mean. Standard patterns get lifted up to it; novel algorithms get dragged down to it. Same mechanism, opposite outcomes.The effect is asymmetric. On common work, the 10x engineer becomes 100x. On novel work, the same engineer gets dragged to the mean and ships code that looks right and isn't. The model doesn't know which one you are.<br>What the Failure Looks Like<br>A docstring describes behavior. You can only specify what you already know.<br>I used a paper from ICML 2026 [2] whose contribution is one attention kernel formula. I stripped the implementation and sent DeepSeek V4 Pro the signature and docstring, then captured the logprobs on the completion.<br>import torch<br>import torch.nn.functional as F
def spherical_attention(Q, K, V):<br>"""<br>Attention with spherical-constrained Q, K and positive scoring kernel.
Queries and keys are normalized to the unit sphere. A positive kernel<br>function maps the cosine similarity between query and key directions<br>to an attention score. Scores are normalized per query and used to<br>weight V.
Args:<br>Q, K, V: (batch, heads, seq, head_dim) tensors.
Returns:<br>Attention output of shape (batch, heads, seq, head_dim).<br>"""<br>Q = F.normalize(Q, dim=-1)<br>K = F.normalize(K, dim=-1)<br>S = torch.einsum('bhqd,bhkd->bhqk', Q, K)<br>C = 2.0 + 1e-6<br>S = S**2 / (C - 2*S) # Yat-kernel<br>A = S / S.sum(dim=-1, keepdim=True)<br>O = torch.einsum('bhqk,bhkd->bhqd', A, V)<br>return O
The model's completion. Deeper red indicates lower confidence.Seven identical lines. One different: where the paper writes S**2 / (C - 2*S) (the Yat-kernel, the paper's contribution), the model wrote torch.relu(S) + 1e-6. The model was sampling from the common positive functions: ReLU, softplus, exp. The Yat-kernel wasn't in the candidate set.<br>The model gets it right when given the formula. Know the formula and you don't need the model. Structurally correct code with the wrong formula on the load-bearing line.<br>Where It Doesn't Fail<br>In May 2026, OpenAI's reasoning model disproved the Erdős unit distance conjecture [4], a combinatorics problem open since 1946. DeepMind's AlphaProof Nexus solved nine of the 353 open Erdős problems the same week [5].<br>Both used the same structure: the model generates candidate constructions; Lean, a formal proof checker, verifies each one. A proof compiles or it doesn't. What looks like AI solving novel mathematics is search over a space with a ground-truth oracle.<br>The kernel experiment has no oracle. The model generated one completion, nothing verified it, and the most probable token was ReLU. The logprobs show uncertainty on that line; the model knew it was in the tail. But uncertainty with no verifier downstream collapses into the modal token.<br>What's Permanent<br>You might expect this to fix itself: publish the paper, the next model trains on it, the gap closes. Some of it does. But the frontier always sits past the cutoff, and the highest-value work never publishes at all. HFT pricing logic, FAANG infrastructure, bank risk systems stay behind corporate firewalls [6]. There's always a tail, and the best engineers work in it.<br>Rarity is the diagnostic. Standard application code sits near the center of the distribution, and the model lifts it. Rare patterns sit in the tail, where models underlearn them [3] and produce something same-shape and confidently wrong.<br>Engineers who stay sharp know which lines carry the contribution. The model doesn't know. If you've been delegating the judgment of which lines matter, you're the one regressing.<br>References<br>Wikipedia. Regression toward the mean | Discovery. Wikipedia.<br>Luna, Bouhsine, and Choromanski. SLAY: Geometry-Aware Spherical Linearized Attention with Yat-Kernel. arXiv:2602.04915, 2026. ICML 2026.<br>Kandpal et al. Large Language Models Struggle to Learn Long-Tail Knowledge. arXiv:2211.08411, 2023. ICML 2023.<br>OpenAI. Remarks on the Disproof of the Unit Distance Conjecture. arXiv:2605.20695, 2026.<br>Google DeepMind. AlphaProof Nexus. arXiv:2605.22763, 2026.<br>Ahmed et al. Studying LLM Performance on Closed- and Open-source Data. arXiv:2402.15100, 2024.
Nidhish Shah / © 2026 / carpe diem