OpenAI Whisper in 150 lines of NumPy

timothygao1 pts0 comments

GitHub - timothygao8710/minWhisper · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

timothygao8710

minWhisper

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>7 Commits<br>7 Commits

.gitignore

.gitignore

README.md

README.md

example.mp3

example.mp3

example.wav

example.wav

main.py

main.py

main_kv.py

main_kv.py

postprocess.py

postprocess.py

preprocess.py

preprocess.py

pyproject.toml

pyproject.toml

uv.lock

uv.lock

View all files

Repository files navigation

minWhisper

This repo implements all of OpenAI Whisper's forward pass in under 150 lines of Numpy using Einsum / Einops.

compressed_demo.mov

KV cache is 7 lines on top of main.py (O(seq_len ^ 3) -> O(seq_len ^ 2))

Supports any model size in the Whisper family, batched inference, and different audio formats

Details like layernorm and approximate gelu differ slightly from huggingface's implementation to prefer conciseness

tokens_input = x[:, -1:]

Is what actually buys us the reduction in complexity, by only doing the "new" work incurred for each new token">Compare to main.py, the key changes in main_kv.py are

+ kv_cache = {}

+ if name not in kv_cache:<br>kv_cache[name] = np.array([kv_x @ W_k.T, kv_x @ W_v.T + B_v]) # prefill<br>elif is_casual:<br>kv_cache[name], _ = pack([kv_cache[name], np.array([kv_x @ W_k.T, kv_x @ W_v.T + B_v])], 'm b * c') # decode<br>is_casual = False # casual attention reduces to cross attention

Implements KV cache for cross attention (prefill-only), and decode in masked attention by viewing it as cross attention with one query

And of course

tokens_input, _ = pack([tokens_input, x[:, -1:]], 'b *') --> tokens_input = x[:, -1:]

Is what actually buys us the reduction in complexity, by only doing the "new" work incurred for each new token

Quickstart

Download any choice of model checkpoint:

curl -L -o tiny.pt https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt

curl -L -o small.pt https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt

curl -L -o med.pt https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt

More are avaliable at: https://github.com/openai/whisper/blob/main/whisper/__init__.py. Note multilingal versions require different tokenization.

Run preprocess.py. This handles converting to mel-spectogram and tokenization (usually done on-host), the correct input format. It will also generate a numpy file containing template text token scaffolding.

Run main_kv.py or main.py

Run post-process to detokenize the model's output tokens into human-readable form (usually done on-host)

KV Cache Benchmarks

Ran on MacBook Pro M2 Pro, 2023

Example.wav output on tiny.py: The little tales they tell are false. The door was barred, locked and bolted as well. Right pears are fit for a queen's table. A big wet stain was on the round carpet. The kite dipped and swayed but stayed aloft. The pleasant hours fly by much too soon. The room was crowded with a

Model Architecture

From https://cdn.openai.com/papers/whisper.pdf

References

https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py

https://cdn.openai.com/papers/whisper.pdf

https://jessicastringham.net/2018/01/01/einsum/

https://einops.rocks/1-einops-basics/

About

No description, website, or topics provided.

Resources

Readme

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

star

Watchers

watching

Forks

forks

Report repository

Releases

No releases published

Packages

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

Python<br>100.0%

You can’t perform that action at this time.

whisper main https reload openai search

Related Articles