GitHub - SakanaAI/DiffusionBlocks: DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
SakanaAI
DiffusionBlocks
Public
Notifications<br>You must be signed in to change notification settings
Fork<br>16
Star<br>182
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>1 Commit<br>1 Commit
.gitignore
.gitignore
.python-version
.python-version
LICENSE
LICENSE
README.md
README.md
data.py
data.py
dblock_modules.py
dblock_modules.py
main.py
main.py
model.py
model.py
overview.jpg
overview.jpg
pyproject.toml
pyproject.toml
uv.lock
uv.lock
vit.py
vit.py
View all files
Repository files navigation
DiffusionBlocks (ICLR 2026)
We propose DiffusionBlocks , a principled framework that partitions transformers into independently trainable blocks, reducing memory requirements proportionally while maintaining competitive performance across diverse architectures and tasks.
This is an official implementation of DiffusionBlocks on image classification using Vision Transformers (ViT).
Installation
Please install uv. Then, run:
# Install dependencies<br>uv sync
# make sure to login huggingface and wandb<br>uv run huggingface-cli login<br>uv run wandb login
We conducted our experiments in the following environment: Python Version 3.12 and CUDA Version 12.2 H100.
Training
The model checkpoints are saved in logs folder.
Baseline (ViT):
uv run main.py train cifar100 --model_type vit
DiffusionBlocks:
uv run main.py train cifar100 --model_type dblock
NOTE: the total epochs in DiffusionBlocks is multiplied by the number of blocks to align the total number of iterations with the baseline as one step in DiffusionBlocks corresponds to training for one block.
Details<br>In the base setting, we don't reply on techniques such as heavy data augmentation. In case you want to see the performance with heavy data augmentation and learning rate scheduler, run as follows:
Baseline (ViT):
BATCH_SIZE=128<br>EPOCHS=1000<br>POSTFIX="-rand-augment"<br>WARMUP_STEPS=3900<br>MODEL_TYPE="dblock"<br>srun uv run main.py train cifar100 \<br>--model_type $MODEL_TYPE \<br>--batch_size $BATCH_SIZE --num_epochs $EPOCHS --postfix=$POSTFIX \<br>--scheduler_type cosine_with_min_lr --num_warmup_steps $WARMUP_STEPS --lr 5e-4 \<br>--scheduler_specific_kwargs '{"min_lr": 5e-5}' \<br>--add_rand_aug
DiffusionBlocks:
BATCH_SIZE=128<br>EPOCHS=1000<br>POSTFIX="-rand-augment"<br>WARMUP_STEPS=$((3900 * 3)) # 3 indicates the number of blocks<br>MODEL_TYPE="dblock"<br>srun uv run main.py train cifar100 \<br>--model_type $MODEL_TYPE \<br>--batch_size $BATCH_SIZE --num_epochs $EPOCHS --postfix=$POSTFIX \<br>--scheduler_type cosine_with_min_lr --num_warmup_steps $WARMUP_STEPS --lr 5e-4 \<br>--scheduler_specific_kwargs '{"min_lr": 5e-5}' \<br>--add_rand_aug
Evaluation
Baseline (ViT):
CKPT_PATH="logs/path-to-last.ckpt"<br>uv run main.py test cifar100 --model_type vit --ckpt_path $CKPT
DiffusionBlocks:
CKPT_PATH="logs/path-to-last.ckpt"<br>uv run main.py test cifar100 --model_type dblock --ckpt_path $CKPT
Acknowledgement
The implementation of Vision Transformer in vit.py is based on HuggingFace Transformers. And, the implementation of EDM is based on Stability-AI/generative-models.
We are grateful for their work.
Citation
To cite our work, please use the following BibTeX:
@inproceedings{shing2026diffusionblocks,<br>title = {DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation},<br>author. = {Makoto Shing and Masanori Koyama and Takuya Akiba},<br>booktitle = {The Fourteenth International Conference on Learning Representations},<br>year = {2026},<br>url = {https://openreview.net/forum?id=pwVSmK71cS}
About
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
arxiv.org/abs/2506.14202
Resources
Readme
License
Apache-2.0 license
Uh oh!
There was an error while loading. Please reload this page.
Activity
Custom properties
Stars
182<br>stars
Watchers
watching
Forks
16<br>forks
Report repository
Contributors
Uh oh!
There was an error while loading. Please reload this...