Fine-tune FLUX.2 [klein] with a LoRA under 60 minutes
Log In<br>Sign Up
Back to Articles
Fine-tune FLUX.2 [klein] with a LoRA under 60 minutes
Community Article Published<br>June 4, 2026
Upvote 17
+11
Stephen Batifol stephenbtl Follow
black-forest-labs
FLUX.2 [klein] is small enough to fine-tune on a single consumer GPU. A LoRA run<br>on the 4B model fits in 24 GB of VRAM, takes about an hour on an RTX 4090,<br>and costs roughly $0.50 if you rent the GPU. This guide walks the full loop:<br>build a dataset, configure the trainer, run it, load the result in<br>diffusers, and wrap it in a Gradio app<br>you can ship as a Hugging Face Space.
By the end you will have a .safetensors LoRA that teaches klein a specific<br>style, character, look, or edit behavior, plus the few details that decide<br>whether the result is usable or mush.
Everything here uses open weights. FLUX.2-klein-base-4B<br>is Apache 2.0, so you can ship what you train.
Building for the Build Small Hackathon
This guide is part of the Build Small Hackathon,<br>hosted by Gradio and Hugging Face, with Black Forest Labs among the sponsors. The build window is June 5–15, 2026. Two rules shape what you make: the model you use must be 32B parameters or fewer , and your project ships as a Gradio app hosted on a Hugging Face Space .
FLUX.2 [klein] fits the brief directly. The 4B model is well under the 32B cap, it's Apache<br>2.0 so you can ship whatever you build on it, and it runs on the Space's own GPU. A<br>LoRA is how you make it yours: a specific style or edit that fits your track,<br>whether that's solving a real problem for someone you know (the Backyard AI track)<br>or building something deliberately strange (An Adventure in Thousand Token Wood).
The rest of this guide trains that LoRA. The last section shows how to wrap it in<br>the Gradio app you'll submit.
Why klein for fine-tuning
FLUX.2 [klein] ships in a 4B and a 9B size, each with a distilled (4-step) and a<br>base (50-step) variant. For LoRA training the relevant one is base :
Take the 4B model as an example:
It fits. ~13 GB of weights in bf16; a LoRA run lands under 24 GB, so a<br>4090 or an L4 is enough.
It's the training target. Distilled models are step-compressed for fast<br>inference; you train against the base checkpoint and the adapter still loads on<br>the distilled model afterward — it's faster and, in our testing, usually gives<br>even better results.
If you only want to run a LoRA, you do not need to train one — you can find<br>community klein LoRAs on the Hub<br>already. Train when you need a specific look the existing ones don't cover.
What you'll need
15–40 images that share one look (your art, licensed photos, or public-domain<br>works from Wikimedia Commons).
A GPU for ~60 minutes. An RTX 4090 (24 GB) is the sweet spot.
A trainer. This guide uses ostris/ai-toolkit,<br>a popular community trainer with a no-code web UI. It's one of several — any<br>klein-compatible trainer works.
Pick your path
ai-toolkit has a web UI, so you don't have to edit YAML by hand unless you want<br>to. Two ways to run it:
Path<br>Best for<br>Setup
RunPod template<br>most people, ~$0.50/run<br>one-click deploy, UI auto-launches
Local UI<br>you have a 24 GB+ NVIDIA GPU<br>git clone + npm run build_and_start, open localhost:8675
The dataset and caption rules below are identical across both. Ostris has a<br>2-minute walkthrough video if you want to see the<br>UI first.
Step 1: Build a dataset
A style LoRA is the easiest win. Say you want to build your own sprite LoRA<br>like the one above. Collect 15–40 images that share one look:
What a style LoRA gives you: every prompt comes out in one consistent look. These<br>are from Limbicnation/pixel-art-lora,<br>a community klein-4B LoRA (Apache 2.0) — prompt pixel art sprite, … and the style<br>is baked in.
Diverse subjects, angles, and compositions. Don't repeat the same background.
At least 1024 px on the long edge.
One .txt caption per image, same filename (img (1).png → img (1).txt).
Caption the content, never the style
For a style LoRA, your captions describe what is in the image and say nothing about the style . The style is exactly what<br>you want the model to infer on its own.
Each caption starts with a trigger word, then a description of the subject:
SPR1TE8. A knight in plate armor holding a sword, facing forward, plain background.<br>SPR1TE8. A fire-breathing dragon with spread wings, seen from the side.
Do not write "pixel art", "8-bit", "retro game", or "sprite style". If you<br>name the style in the caption, the model learns to depend on that word instead of<br>baking the style into the weights.
Pick a trigger word that is not a real word , so it can't collide with the<br>model's vocabulary: SPR1TE8, RISO_PR1NT, ZK_TOON. Use it identically in<br>every caption and in the config.
One deliberate exception: variations you want to control later. Don't caption<br>the one look you always want — let that bake into the trigger. But if your dataset<br>has clear sub-styles you'd like to switch between at inference, name those. The<br>pixel-art LoRA...