The LLM Fine-Tuning Guide

The Ultimate LLM Fine-Tuning Guide - by PromptInjection

Prompt Injection

SubscribeSign in

The Ultimate LLM Fine-Tuning Guide From dataset to GGUF - every parameter explained, every step runnable

PromptInjection May 03, 2026

Fine-tuning is a direct intervention into how a language model behaves. Not prompting, not system instructions, not RAG - actual weight modification. The model after training is a different model than before. The use cases span an unusually wide range. Teaching a model a specific writing style or persona. Injecting domain knowledge it wasn’t trained on. Making it respond consistently in a particular language or format. Eliminating behaviors you don’t want. Building a character for a game that stays in character under pressure. Aligning a general-purpose model to a narrow, specialized task where generic responses are worse than useless. All of these are fine-tuning problems, and all of them work through the same mechanism: you show the model enough examples of what you want until the weights move. Prompt Injection is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

This guide walks through the complete pipeline - environment setup, dataset format, training configuration, and export to a GGUF file you can run locally. The example model is Qwen3-0.6B, small enough to train on modest hardware. But the principles scale. The same levers that move a 0.6B model move a 70B model. The numbers change. The logic doesn’t.

What Fine-Tuning Actually Does

A language model is a probability distribution over tokens. Given a sequence of text, it assigns probabilities to what comes next. Training adjusts the weights — billions of floating point numbers - so that the distribution shifts. The model that previously said “Paris” when asked about capitals still says “Paris”, but the model that previously rambled when asked to write product copy now writes clean, structured product copy. Fine-tuning doesn’t erase what the model knows. It reshapes how that knowledge surfaces. Think of it less as reprogramming and more as extended, very intensive behavioral conditioning.

The Stack

ms-swift — the training framework. Wraps HuggingFace Transformers with a clean CLI and sane defaults.

llama.cpp — for converting the trained model to GGUF format, which is what local inference tools like LM Studio, Ollama, and llama-server consume.

Miniconda — environment management. Keeps the CUDA dependencies isolated.

Prerequisites

GPU: An NVIDIA GPU with Turing architecture or newer — that’s the RTX 2000 series / GTX 1660 Ti and up. CUDA 12.8 requires at minimum Compute Capability 7.5, which corresponds to Turing. Pascal (GTX 1000-series) is not supported. Realistically, for anything beyond a 0.6B toy model you want at least 8–12 GB VRAM — an RTX 3080, RTX 4070, or equivalent. The more VRAM, the larger the model and sequence length you can handle. Driver: Linux driver ≥ 570.26, Windows driver ≥ 570.65. Check your current version with: nvidia-smi If the driver is outdated, update it before proceeding - mismatched driver/CUDA versions are the most common source of silent failures in this stack. OS: Native Linux or Windows with WSL2. The setup below assumes Ubuntu. On WSL2: install the NVIDIA driver on the Windows host only — never inside WSL2. The driver is automatically exposed inside WSL2 as libcuda.so. Do not run apt install nvidia-driver-* inside WSL2. CUDA Toolkit: Recommended on both native Linux and WSL2. The toolkit (nvcc, libraries) is separate from the driver. Ubuntu 22.04: wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt update && sudo apt install cuda-toolkit-12-8 -y Ubuntu 24.04: wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt update && sudo apt install cuda-toolkit-12-8 -y After installation, add the toolkit to your PATH: echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc source ~/.bashrc Verify with nvidia-smi (driver) and nvcc --version (toolkit).

Environment Setup

mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm ~/miniconda3/miniconda.sh

eval "$(~/miniconda3/bin/conda shell.bash hook)" conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

conda create -n finetune python=3.11 -y source ~/miniconda3/bin/activate conda init --all conda activate finetune Then install PyTorch with CUDA 12.8 support, a prebuilt Flash Attention wheel, and ms-swift: pip install torch==2.9.1...

The LLM Fine-Tuning Guide

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast