GitHub - timothygao8710/minWhisper · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
timothygao8710
minWhisper
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>7 Commits<br>7 Commits
.gitignore
.gitignore
README.md
README.md
example.mp3
example.mp3
example.wav
example.wav
main.py
main.py
main_kv.py
main_kv.py
postprocess.py
postprocess.py
preprocess.py
preprocess.py
pyproject.toml
pyproject.toml
uv.lock
uv.lock
View all files
Repository files navigation
minWhisper
This repo implements all of OpenAI Whisper's forward pass in under 150 lines of Numpy using Einsum / Einops.
compressed_demo.mov
KV cache is 7 lines on top of main.py (O(seq_len ^ 3) -> O(seq_len ^ 2))
Supports any model size in the Whisper family, batched inference, and different audio formats
Details like layernorm and approximate gelu differ slightly from huggingface's implementation to prefer conciseness
tokens_input = x[:, -1:]
Is what actually buys us the reduction in complexity, by only doing the "new" work incurred for each new token">Compare to main.py, the key changes in main_kv.py are
+ kv_cache = {}
+ if name not in kv_cache:<br>kv_cache[name] = np.array([kv_x @ W_k.T, kv_x @ W_v.T + B_v]) # prefill<br>elif is_casual:<br>kv_cache[name], _ = pack([kv_cache[name], np.array([kv_x @ W_k.T, kv_x @ W_v.T + B_v])], 'm b * c') # decode<br>is_casual = False # casual attention reduces to cross attention
Implements KV cache for cross attention (prefill-only), and decode in masked attention by viewing it as cross attention with one query
And of course
tokens_input, _ = pack([tokens_input, x[:, -1:]], 'b *') --> tokens_input = x[:, -1:]
Is what actually buys us the reduction in complexity, by only doing the "new" work incurred for each new token
Quickstart
Download any choice of model checkpoint:
curl -L -o tiny.pt https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt
curl -L -o small.pt https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
curl -L -o med.pt https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt
More are avaliable at: https://github.com/openai/whisper/blob/main/whisper/__init__.py. Note multilingal versions require different tokenization.
Run preprocess.py. This handles converting to mel-spectogram and tokenization (usually done on-host), the correct input format. It will also generate a numpy file containing template text token scaffolding.
Run main_kv.py or main.py
Run post-process to detokenize the model's output tokens into human-readable form (usually done on-host)
KV Cache Benchmarks
Ran on MacBook Pro M2 Pro, 2023
Example.wav output on tiny.py: The little tales they tell are false. The door was barred, locked and bolted as well. Right pears are fit for a queen's table. A big wet stain was on the round carpet. The kite dipped and swayed but stayed aloft. The pleasant hours fly by much too soon. The room was crowded with a
Model Architecture
From https://cdn.openai.com/papers/whisper.pdf
References
https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py
https://cdn.openai.com/papers/whisper.pdf
https://jessicastringham.net/2018/01/01/einsum/
https://einops.rocks/1-einops-basics/
About
No description, website, or topics provided.
Resources
Readme
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
star
Watchers
watching
Forks
forks
Report repository
Releases
No releases published
Packages
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
Python<br>100.0%
You can’t perform that action at this time.