GitHub - pyxll/excel-gpt: Minimal GPT model implemented in Excel · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
pyxll
excel-gpt
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>4 Commits<br>4 Commits
images
images
LICENSE
LICENSE
README.md
README.md
excel-gpt.xlsx
excel-gpt.xlsx
View all files
Repository files navigation
Excel-GPT
Minimal GPT model implemented in Excel inspired by https://karpathy.github.io/2026/02/12/microgpt/.
Minimal Implementation : Focuses on the core GPT architecture with zero dependencies.
Excel Only : Only uses Excel's formula capabilities for computation, no VBA.
The included Excel file generates plausible sounding names.
The workbook is explained in this video:
It is well worth reading https://karpathy.github.io/2026/02/12/microgpt/ as you follow along with<br>the spreadsheet. Everything is explained very well there, and I have not repeated everything here.
Motivation
Using tools like PyXLL (https://www.pyxll.com) we can integrate Python code into Excel. We<br>can wrap the GPT Python model to generate text from Excel that way and instantly call the<br>model from Excel.
As a learning exercise, I wanted to do the opposite here and implement the micro GPT<br>model entirely in Excel formulas without any Python code. This, of course, results in a more<br>complex spreadsheet than simply calling a single function, but it allows us to peek inside the<br>model in a way that is much harder with a plain Python script.
In real-world scenarios I would never expect to build a spreadsheet with this much complexity<br>baked into it. It would be far better to move the complexity into Python, where it can be properly<br>tested and debugged, and then call that Python code from Excel using the PyXLL add-in.
Architecture
The model is implemented in the Model sheet.
The model is implemented as an unrolled loop in Excel, with a block for each output token.<br>Each block takes a previous token, a position, the parameters, and the keys and values from<br>the previous positions. The output of each block is the logits (scores) over what token the<br>model predicts next, and the predicted output token.
We follow Andrej Karpathy's microGPT and use the same simplifications: RMSNorm instead of<br>LayerNorm, no biases, and ReLU instead of GeLU.
Each block starts with the current position id, the previous token, and the token from the current training<br>target. The target isn't used when running the model, it is only used in training which is not part of this<br>sheet. A special token '?' is used to indicate the start and end of the name.
Next are the embedding vectors. These follow the original microGPT code and are the learned vectors for<br>the position and token looked up from the weights tables. The position and token embeddings are summed<br>to give a joint embedding.
The attention block is the same as in the original microGPT project, but with the loop unrolled and<br>each step repeated for each iteration of the loop. We are using 4 attention heads, so there<br>are 4 sets of rows for this. We compute the query (Q), key (K) and value (V) for the current token, and<br>make the key and value from previous positions available to the current position. Each attention head<br>computes the dot product between the query and keys (current key and previous keys), and takes the<br>weighted sum of the values. The head outputs are recombined and projected to the attention output<br>through the trained projection matrix attn_wo.
The MLP (multilayer perceptron) block projects the attention output through the MLP projection matrix mlp_fc1, applies ReLU<br>to clamp values to >= 0, and then projects the result back down to the embedding dimension through<br>mlp_fc2.
Both the MLP and attention blocks output residuals, which are added back to the inputs to produce the<br>output of the block. This lets gradients flow directly through the network and makes deeper models trainable.
Finally, the MLP output is projected back to the vocabulary dimension through the...