Fine Tuning a Tiny Local LLM to Categorize Questions

As a fun personal project, I have been working on a chatbot for answering general questions about my household on anything from maintenance questions to doctor’s appointments.

The general idea is that the chatbot will get its household knowledge through RAG from querying a vector database, but for better results I have made the vector searches metadata aware.

Basically, I am running questions through a pre-processing step to categorize questions into known metadata categories (e.g. pool, car, hvac, cooking). The main goal of this is to narrow down the search space for vector ranking to only indexed entries that match the category of the question. As an example, the question “When did we replace our pool pump?” will be mapped to a category called “pool” before querying the Index database. ">

As a fun personal project, I have been working on a chatbot for answering general questions about my household on anything from maintenance questions to doctor’s appointments.

The general idea is that the chatbot will get its household knowledge through RAG from querying a vector database, but for better results I have made the vector searches metadata aware.

Fine Tuning a Local LLM to Categorize Questions

Teach Me Cool Stuff

Fine Tuning a Local LLM to Categorize Questions

Published: 16 Jun, 2026

Author Torgeir Helgevold

As a fun personal project, I have been working on a chatbot for answering general questions about my household on anything from maintenance questions to doctor’s appointments.

The general idea is that the chatbot will get its household knowledge through RAG from querying a vector database, but for better results I have made the vector searches metadata aware.

The hypothesis I want to test in this experiment is whether a very small local LLM can be fine-tuned to perform reliable question categorization when trained on a dataset of household-related questions

LLMs

In this project I am using two different local llms – Qwen 3:4B and Qwen 3:0.6B. The 4B parameter version is used for general question answering, while the super tiny 0.6B version is used to categorize questions. The whole premise of this experiment is to see if a tiny llm with only 600M parameters can be finetuned into a reliable classifier of household questions.

Finetuning

For finetuning I am using a popular open-source framework called Unsloth, which seems well suited for tuning local models like Qwen and Llama.

For training purposes my initial dataset consists of about ~850 data entries where I do a 70/15/15 percentage-based split into training data, eval data and test data respectively. Training data and eval data are used during training, while the test dataset is withheld and used to run a test post training. See section below for sample data:

"question": "Who cleans our gutters at the house?", "category": "gutters" }, "question": "Who serviced the hot water heater for the home?", "category": "water heater" }, "question": "Who fixed the sprinkler system in the yard?", "category": "irrigation" }, "question": "Which store do we usually buy pinnekjott from?", "category": "cooking" }, "question": "What dimensions are the air filters for the home AC?", "category": "hvac" }, "question": "What year did we replace the downstairs AC unit?", "category": "hvac"

The basic idea is to train the llm on a sufficient set of household questions to teach it to become a reliable question classifier.

Baseline

Before doing any finetuning, it’s important to establish a baseline to measure against. In this experiment the baseline is to try to use the original Qwen 0.6B model “as is” through prompting alone. A sample prompt used for the baseline can be found below:

Classify the homeowner question into exactly one category from the list below. Return only the category name from the list. Never return a code, a number, a synonym, an explanation, or any other text. The answer must be exactly one category name from the list. Choose the best category based on the meaning of the question.

Valid categories: - appliances - brick work - car - cooking - doorbell - electric - fence -...

Fine Tuning a Tiny Local LLM to Categorize Questions

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi