GitHub - AnthonyBeeblebrox/pybench: Discover benchmark functions, run them across many seeds, and statistically detect regressions against a saved baseline. · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
AnthonyBeeblebrox
pybench
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
master
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>1 Commit<br>1 Commit
.pybench
.pybench
docs
docs
examples
examples
src/pybench
src/pybench
tests
tests
.gitignore
.gitignore
.pre-commit-config.yaml
.pre-commit-config.yaml
.python-version
.python-version
.readthedocs.yaml
.readthedocs.yaml
README.md
README.md
dev.py
dev.py
pyproject.toml
pyproject.toml
uv.lock
uv.lock
View all files
Repository files navigation
pybench
Discover benchmark functions, run them across many seeds, and statistically<br>detect regressions against a saved baseline.
pybench reruns each benchmark on the same stored seeds as its baseline, so<br>the comparison is paired (far more sensitive than a two-sample test), and<br>judges the whole benchmark with a within-seed sign-flip permutation test that<br>respects correlation across metrics and steps.
Docs: pybench.readthedocs.io
Install
uv add git+https://github.com/AnthonyBeeblebrox/pybench # or: pip install git+https://github.com/AnthonyBeeblebrox/pybench
Quickstart
Write a bench_* function that takes a seed and returns a score (higher is<br>better; prefix lower-is-better metrics with min:):
float:<br>return train_and_score(seed) # a float, or a dict, or a list[dict] of steps"># benchmarks/bench_model.py<br>def bench_accuracy(seed: int) -> float:<br>return train_and_score(seed) # a float, or a dict, or a list[dict] of steps
pybench # 1st time: samples seeds, saves a baseline, marks NEW<br>pybench # later: reruns on the same seeds, marks PASS / FAIL (exit 1 on fail)<br>pybench update --yes # re-baseline after an intended change<br>pybench show # print current baseline stats (--history for per-commit history)
pybench exits non-zero when any benchmark regresses, so it drops straight<br>into CI like pytest.
Return formats
def bench_a(seed): return 0.91 # scalar<br>def bench_b(seed): return {"accuracy": 0.91, "min:loss": 0.42} # multiple metrics<br>def bench_c(seed): # multi-step curve<br>return [{"step": 1, "min:loss": 0.9}, {"step": 10, "min:loss": 0.3}]
Configuration
Per-benchmark settings are keyword-only defaults — no config file:
list[dict]:<br>...">def bench_training(seed: int, *, n_seeds: int = 50, alpha: float = 0.01,<br>min_effect: float = 0.02, workers: int = 4) -> list[dict]:<br>...
Parameter<br>Default<br>Meaning
n_seeds<br>30<br>Seeds sampled for the baseline
alpha<br>0.05<br>Significance threshold
min_effect<br>None<br>Minimum relative drop to flag (suppress trivia)
workers<br>Parallel seed processes (keep 1 for GPU/serial)
Commit your baseline
The baseline lives at .pybench/baselines.jsonl (one line per benchmark).<br>Commit it to git — do not gitignore it. History is delegated to git: commit<br>the file after each pybench update, and pybench show --history reconstructs<br>the baseline at every commit that touched it.
About
Discover benchmark functions, run them across many seeds, and statistically detect regressions against a saved baseline.
pybench.readthedocs.io/en/latest/
Resources
Readme
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
stars
Watchers
watching
Forks
forks
Report repository
Releases
No releases published
Packages
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
Python<br>100.0%
You can’t perform that action at this time.