DiffusionBlocks – Block-Wise NN Training via Diffusion Interpretation

aanet1 pts1 comments

GitHub - SakanaAI/DiffusionBlocks: DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

SakanaAI

DiffusionBlocks

Public

Notifications<br>You must be signed in to change notification settings

Fork<br>16

Star<br>182

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>1 Commit<br>1 Commit

.gitignore

.gitignore

.python-version

.python-version

LICENSE

LICENSE

README.md

README.md

data.py

data.py

dblock_modules.py

dblock_modules.py

main.py

main.py

model.py

model.py

overview.jpg

overview.jpg

pyproject.toml

pyproject.toml

uv.lock

uv.lock

vit.py

vit.py

View all files

Repository files navigation

DiffusionBlocks (ICLR 2026)

We propose DiffusionBlocks , a principled framework that partitions transformers into independently trainable blocks, reducing memory requirements proportionally while maintaining competitive performance across diverse architectures and tasks.

This is an official implementation of DiffusionBlocks on image classification using Vision Transformers (ViT).

Installation

Please install uv. Then, run:

# Install dependencies<br>uv sync

# make sure to login huggingface and wandb<br>uv run huggingface-cli login<br>uv run wandb login

We conducted our experiments in the following environment: Python Version 3.12 and CUDA Version 12.2 H100.

Training

The model checkpoints are saved in logs folder.

Baseline (ViT):

uv run main.py train cifar100 --model_type vit

DiffusionBlocks:

uv run main.py train cifar100 --model_type dblock

NOTE: the total epochs in DiffusionBlocks is multiplied by the number of blocks to align the total number of iterations with the baseline as one step in DiffusionBlocks corresponds to training for one block.

Details<br>In the base setting, we don't reply on techniques such as heavy data augmentation. In case you want to see the performance with heavy data augmentation and learning rate scheduler, run as follows:

Baseline (ViT):

BATCH_SIZE=128<br>EPOCHS=1000<br>POSTFIX="-rand-augment"<br>WARMUP_STEPS=3900<br>MODEL_TYPE="dblock"<br>srun uv run main.py train cifar100 \<br>--model_type $MODEL_TYPE \<br>--batch_size $BATCH_SIZE --num_epochs $EPOCHS --postfix=$POSTFIX \<br>--scheduler_type cosine_with_min_lr --num_warmup_steps $WARMUP_STEPS --lr 5e-4 \<br>--scheduler_specific_kwargs '{"min_lr": 5e-5}' \<br>--add_rand_aug

DiffusionBlocks:

BATCH_SIZE=128<br>EPOCHS=1000<br>POSTFIX="-rand-augment"<br>WARMUP_STEPS=$((3900 * 3)) # 3 indicates the number of blocks<br>MODEL_TYPE="dblock"<br>srun uv run main.py train cifar100 \<br>--model_type $MODEL_TYPE \<br>--batch_size $BATCH_SIZE --num_epochs $EPOCHS --postfix=$POSTFIX \<br>--scheduler_type cosine_with_min_lr --num_warmup_steps $WARMUP_STEPS --lr 5e-4 \<br>--scheduler_specific_kwargs '{"min_lr": 5e-5}' \<br>--add_rand_aug

Evaluation

Baseline (ViT):

CKPT_PATH="logs/path-to-last.ckpt"<br>uv run main.py test cifar100 --model_type vit --ckpt_path $CKPT

DiffusionBlocks:

CKPT_PATH="logs/path-to-last.ckpt"<br>uv run main.py test cifar100 --model_type dblock --ckpt_path $CKPT

Acknowledgement

The implementation of Vision Transformer in vit.py is based on HuggingFace Transformers. And, the implementation of EDM is based on Stability-AI/generative-models.

We are grateful for their work.

Citation

To cite our work, please use the following BibTeX:

@inproceedings{shing2026diffusionblocks,<br>title = {DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation},<br>author. = {Makoto Shing and Masanori Koyama and Takuya Akiba},<br>booktitle = {The Fourteenth International Conference on Learning Representations},<br>year = {2026},<br>url = {https://openreview.net/forum?id=pwVSmK71cS}

About

DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation

arxiv.org/abs/2506.14202

Resources

Readme

License

Apache-2.0 license

Uh oh!

There was an error while loading. Please reload this page.

Activity

Custom properties

Stars

182<br>stars

Watchers

watching

Forks

16<br>forks

Report repository

Contributors

Uh oh!

There was an error while loading. Please reload this...

diffusionblocks model_type main training cifar100 batch_size

Related Articles