GitHub - AaravGaurdev/deltatensors · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
AaravGaurdev
deltatensors
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>15 Commits<br>15 Commits
deltatensors
deltatensors
docs
docs
tests
tests
.gitignore
.gitignore
.readthedocs.yaml
.readthedocs.yaml
README.md
README.md
mkdocs.yml
mkdocs.yml
pyproject.toml
pyproject.toml
View all files
Repository files navigation
deltatensors
Near-lossless delta compression for fine-tuned neural network models.
Instead of storing 50 fine-tunes of the same base model, store one base and 50 small .wdelta delta files. deltatensors compresses the delta between a base and fine-tuned model, and reconstructs with sub-1% perplexity difference.
Tested on Qwen2.5-0.5B fine-tuned on WikiText-2:
Perplexity: 19.11 (original) → 19.22 (reconstructed) — 0.58% perplexity difference
Less degradation than standard int4 quantization of the full model
294 MB delta vs 953 MB fine-tuned model (3.2x)
~2.8x total storage reduction across 10 fine-tunes
base_model.safetensors 1.0 GB<br>checkpoint_01.wdelta 294 MB<br>checkpoint_02.wdelta 294 MB<br>...<br>checkpoint_10.wdelta 294 MB<br>─────────────────────────────────<br>Total 3.9 GB vs 11 GB naive
Install
pip install deltatensors<br>pip install torch safetensors # for loading from safetensors directories
Quick start
import deltatensors as dt
# save delta between a fine-tuned and base model (streaming, O(1) RAM)<br>dt.save_delta_from_paths("checkpoint.wdelta", "qwen-wiki/", "qwen-base/", strategy="int4")
# reconstruct without loading the full base into RAM<br>recon_sd = dt.load_delta_from_paths("checkpoint.wdelta", "qwen-base/")
# inspect a delta file without a base model<br>info = dt.inspect("checkpoint.wdelta")<br>print(info)<br># {'path': 'checkpoint.wdelta', 'size_mb': 294.2, 'strategy': 'int4', 'n_tensors': 290, ...}
Compression strategies
Strategy<br>Quality<br>Compression
int4<br>near-lossless (~0.5% PPL)<br>best
sparse<br>tunable via sparsity=<br>good
quantized<br>BitDelta-style 1-bit<br>aggressive
int4 uses outlier extraction (top k% weights stored in float16) + 4-bit quantization for the remainder. This was the strategy used for the example at the start.
Why not LoRA?
LoRA constrains the delta to be low-rank during training, which limits expressiveness. deltatensors compresses arbitrary full fine-tune deltas after training - no constraints on how you fine-tune.
Roadmap
Lineage — chain multiple .wdelta files to track and reconstruct full fine-tuning histories
License
MIT
p.s. If you find deltatensors useful, please consider leaving a ⭐ star on the repository to help others find it!
About
No description, website, or topics provided.
Resources
Readme
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
star
Watchers
watching
Forks
forks
Report repository
Releases
tags
Packages
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
Python<br>100.0%
You can’t perform that action at this time.