Fine-tune FLUX.2 [Klein] with a LoRA under 60 minutes

Fine-tune FLUX.2 [klein] with a LoRA under 60 minutes

Back to Articles

Fine-tune FLUX.2 [klein] with a LoRA under 60 minutes

Community Article Published June 4, 2026

Upvote 17

+11

Stephen Batifol stephenbtl Follow

black-forest-labs

FLUX.2 [klein] is small enough to fine-tune on a single consumer GPU. A LoRA run on the 4B model fits in 24 GB of VRAM, takes about an hour on an RTX 4090, and costs roughly $0.50 if you rent the GPU. This guide walks the full loop: build a dataset, configure the trainer, run it, load the result in diffusers, and wrap it in a Gradio app you can ship as a Hugging Face Space.

By the end you will have a .safetensors LoRA that teaches klein a specific style, character, look, or edit behavior, plus the few details that decide whether the result is usable or mush.

Everything here uses open weights. FLUX.2-klein-base-4B is Apache 2.0, so you can ship what you train.

Building for the Build Small Hackathon

This guide is part of the Build Small Hackathon, hosted by Gradio and Hugging Face, with Black Forest Labs among the sponsors. The build window is June 5–15, 2026. Two rules shape what you make: the model you use must be 32B parameters or fewer , and your project ships as a Gradio app hosted on a Hugging Face Space .

FLUX.2 [klein] fits the brief directly. The 4B model is well under the 32B cap, it's Apache 2.0 so you can ship whatever you build on it, and it runs on the Space's own GPU. A LoRA is how you make it yours: a specific style or edit that fits your track, whether that's solving a real problem for someone you know (the Backyard AI track) or building something deliberately strange (An Adventure in Thousand Token Wood).

The rest of this guide trains that LoRA. The last section shows how to wrap it in the Gradio app you'll submit.

Why klein for fine-tuning

FLUX.2 [klein] ships in a 4B and a 9B size, each with a distilled (4-step) and a base (50-step) variant. For LoRA training the relevant one is base :

Take the 4B model as an example:

It fits. ~13 GB of weights in bf16; a LoRA run lands under 24 GB, so a 4090 or an L4 is enough.

It's the training target. Distilled models are step-compressed for fast inference; you train against the base checkpoint and the adapter still loads on the distilled model afterward — it's faster and, in our testing, usually gives even better results.

If you only want to run a LoRA, you do not need to train one — you can find community klein LoRAs on the Hub already. Train when you need a specific look the existing ones don't cover.

What you'll need

15–40 images that share one look (your art, licensed photos, or public-domain works from Wikimedia Commons).

A GPU for ~60 minutes. An RTX 4090 (24 GB) is the sweet spot.

A trainer. This guide uses ostris/ai-toolkit, a popular community trainer with a no-code web UI. It's one of several — any klein-compatible trainer works.

Pick your path

ai-toolkit has a web UI, so you don't have to edit YAML by hand unless you want to. Two ways to run it:

Path Best for Setup

RunPod template most people, ~$0.50/run one-click deploy, UI auto-launches

Local UI you have a 24 GB+ NVIDIA GPU git clone + npm run build_and_start, open localhost:8675

The dataset and caption rules below are identical across both. Ostris has a 2-minute walkthrough video if you want to see the UI first.

Step 1: Build a dataset

A style LoRA is the easiest win. Say you want to build your own sprite LoRA like the one above. Collect 15–40 images that share one look:

What a style LoRA gives you: every prompt comes out in one consistent look. These are from Limbicnation/pixel-art-lora, a community klein-4B LoRA (Apache 2.0) — prompt pixel art sprite, … and the style is baked in.

Diverse subjects, angles, and compositions. Don't repeat the same background.

At least 1024 px on the long edge.

One .txt caption per image, same filename (img (1).png → img (1).txt).

Caption the content, never the style

For a style LoRA, your captions describe what is in the image and say nothing about the style . The style is exactly what you want the model to infer on its own.

Each caption starts with a trigger word, then a description of the subject:

SPR1TE8. A knight in plate armor holding a sword, facing forward, plain background. SPR1TE8. A fire-breathing dragon with spread wings, seen from the side.

Do not write "pixel art", "8-bit", "retro game", or "sprite style". If you name the style in the caption, the model learns to depend on that word instead of baking the style into the weights.

Pick a trigger word that is not a real word , so it can't collide with the model's vocabulary: SPR1TE8, RISO_PR1NT, ZK_TOON. Use it identically in every caption and in the config.

One deliberate exception: variations you want to control later. Don't caption the one look you always want — let that bake into the trigger. But if your dataset has clear sub-styles you'd like to switch between at inference, name those. The pixel-art LoRA...

Fine-tune FLUX.2 [Klein] with a LoRA under 60 minutes

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs