Tiny GPT in Go. Optimised for Understanding. Trained on Jules Verne Books

GitHub - zakirullin/gpt-go: Tiny GPT implemented from scratch in pure Go. Trained on Jules Verne books. Explained. · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

zakirullin

gpt-go

Public

Notifications You must be signed in to change notification settings

Fork 45

Star 639

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 391 Commits 391 Commits

data

pkg

.gitignore

LICENSE

README.md

block.go

go.mod

go.sum

head.go

layer.go

main.go

main_test.go

View all files

Repository files navigation

gpt-go

Simple GPT implementation in pure Go. Trained on favourite Jules Verne books.

What kind of response you can expect from the model:

Mysterious Island. Well. My days must follow

Or this:

Captain Nemo, in two hundred thousand feet weary in the existence of the world.

How to run

$ go run .

It takes about 40 minutes to train on MacBook Air M3. The trained weights will be saved to model-1.234M file. If you rerun the model, it will pick up the saved weights and continue training. The loss should decrease each time, indicating that the model is learning something useful.

You can train on your own dataset by pointing the data.dataset variable to your text corpus.

To run in chat-only mode once the training is done:

$ go run . -chat

How to understand

You can use this repository as a companion to the Neural Networks: Zero to Hero course. Use git checkout to see how the model has evolved over time: naive, bigram, multihead, block, residual, full.

In main_test.go you will find explanations starting from basic neuron example:

// Our neuron has 2 inputs and 1 output (number of columns in weight matrix). // Its goal is to predict next number in the sequence. input := V{1, 2} // {x1, x2} weight := M{ {2}, // how much x1 contributes to the output {3}, // how much x2 contributes to the output

All the way to self-attention mechanism:

// To calculate the sum of all previous tokens, we can multiply by this triangular matrix: tril := M{ {1, 0, 0, 0}, // first token attends only at itself ("cat"), it can't look into the future {1, 1, 0, 0}, // second token attends at itself and the previous token ( "cat" + ", ") {1, 1, 1, 0}, // third token attends at itself and the two previous tokens ("cat" + ", " + "dog") {1, 1, 1, 1}, // fourth token attends at itself and all the previous tokens ("cat" + ", " + "dog" + " and") }.Var() // So, at this point each embedding is enriched with the information from all the previous tokens. // That's the crux of self-attention. enrichedEmbeds := MatMul(tril, inputEmbeds)

Design choices

No batches.

I've given up the complexity of the batch dimension for the sake of better understanding. It's far easier to build intuition with 2D matrices, rather than with 3D tensors. Besides, batches aren't inherent to the transformer architecture. For better gradient smoothing gradient accumulation was tried. The effect was negligible, so it was removed as well.

Removed gonum.

The gonum.matmul gave us ~30% performance boost, but it brought additional dependency. We're not striving for maximum efficiency here, rather for radical simplicity. Current matmul implementation is quite effective, and it's only 40 lines of plain readable code.

Papers

You don't need to read them to understand the code :)

Attention Is All You Need

Deep Residual Learning

DeepMind WaveNet

Batch Normalization

Deep NN + huge data = breakthrough performance

OpenAI GPT-3 paper

Analyzing the Structure of Attention

Credits

Many thanks to Andrej Karpathy for his brilliant Neural Networks: Zero to Hero course.

Thanks to @itsubaki for his elegant autograd package.

About

Tiny GPT implemented from scratch in pure Go. Trained on Jules Verne books. Explained.

Resources

Readme

License

MIT license

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

639 stars

Watchers

watching

Forks

45 forks

Report repository

Releases

Tiny GPT in Go. Optimised for Understanding. Trained on Jules Verne Books

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine