Show HN: A Transformer Is All You Need

A Transformer Is All You Need | Zenodo

Skip to main

You are using an outdated browser. Please upgrade your browser to improve your experience.

Published June 26, 2026

| Version v1

Preprint

Open

A Transformer Is All You Need

Authors/Creators

Lamoureux, Marc

Description

The unanswered question in mechanistic interpretability of pretrained transformers is plain: for any prompt and any decoder-only transformer, which weights at which layers along which residual-stream dimensions produced the decision the model emitted? Activation probing reports a per-depth accuracy curve. Sparse dictionaries decompose activations into monosemantic features. Logit and tuned lenses trace the trajectory of a prediction through the residual stream. None of these names the weight that did the work. The weights are the artifact training produced, the substrate every activation must traverse, the only object in the system that persists across forward passes; interpretability that treats them as a fixed backdrop describes what the model is doing right now, never why this particular model with these particular weights had to do it.

We close that gap with one primitive — the alignment of a residual-stream activation with the top singular directions of a weight matrix, scaled by the singular values — and a small cross-layer transformer (the hybrid weight–activation probe) that consumes the joint (activation, alignment) sequence and predicts the host model's next-token decision. As a byproduct of training, the probe exposes per-layer importance (the depth at which the host's decision crystallized) and per-layer alignment importance over the three weight families Q/K/V, attention output, and MLP up/gate (which family at each layer carried the decisional signal, and via the SVD along which singular directions). A separate gradient-attribution pass through the host model closes the causal loop, confirming the weights the probe identifies are the same weights whose perturbation moves the host's logit on that decision. The pipeline answers, for any prompt on any frozen pretrained decoder-only transformer, the question every prior interpretability tool has had to leave open: which weight, at which layer, along which dimensions, produced this token.

We demonstrate the pipeline on four structurally distinct decoder-only transformers spanning five years of architectural and training evolution: GPT-2 medium (2019, 355M, WebText), Pythia 2.8B (2023, 2.8B, the Pile), Mistral 7B v0.1 (late 2023, 7.3B, SwiGLU/RMSNorm/GQA/sliding-window), and LLaMA 3 8B base (2024, 8B, SwiGLU/RMSNorm/GQA, 128K-token tiktoken vocabulary, 15T training tokens). On all four the probe converges well above the 0.001 random baseline over a compact 1024-token target vocabulary and produces a coherent per-prompt attribution report; absolute accuracy serves only as a chance-baseline sanity check, and the attribution result is invariant under any above-chance probe accuracy. As an unplanned byproduct of running the same pipeline on this panel, the per-weight-family attribution proportions on all four hosts lie within ℓ₁ distance 0.019 of the uniform [1/3, 1/3, 1/3] vertex of the 2-simplex, with a maximum pairwise ℓ₁ separation of 0.034. We did not engineer this observation and did not select hosts to produce it; we report it as a downstream finding, not as the central claim.

From the single primitive of weight-level causal decision attribution follow nine capability families: per-prompt visibility into the decision pathway at every layer; causal diagnostics with no behavioral inference; weight-level surgical intervention on specific model behaviors with no retraining, fine-tuning, or RLHF; capability operations (localization, extraction, transplantation, removal); security and forensics including backdoor, sleeper-agent, distillation-source, and post-training tampering detection; safety-specific detection of deceptive alignment, sandbagging, hidden goals, evaluation awareness, sycophancy, pressure deception, reward hacking circuits, and specification gaming at the structural substrate; training economics through capability-preserving compression and targeted fine-tuning; cross-lab audit capability over any transformer family with no method rebuild; and comparative analysis across architectures, training methods, checkpoints, fine-tunes, and merges. The instrument is the result. The reproducibility observation is one of its dividends, not its claim.

Files

TransformerIsAllYouNeed.pdf

Files<br>(148.6 kB)

Name<br>Size

Download all

TransformerIsAllYouNeed.pdf

md5:dd8ece9df8359368c6cb16a3b492b299

148.6 kB

Preview

Download

Views

Downloads

Show more details

All versions<br>This version

Views

Total views

Downloads

Total downloads

Data volume

Total data volume

0 Bytes<br>0 Bytes

More info on how stats are collected....

Versions

External resources

Indexed in

OpenAIRE

Communities

Keywords and subjects

Keywords

mecahanistic interpretability

transformer...

Show HN: A Transformer Is All You Need

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars