Local LLMs perform so much better when you teach them to ask before they answer
Close
Close
By
Korbin Brown
Published May 23, 2026, 4:30 PM EDT
Korbin is a Linux system administrator who spends most of his time in a terminal figuring out how things actually work. Over the last decade he's written hundreds of articles about Linux configuration, troubleshooting weird problems, and using open-source tools in the real world.
He also works a lot with Windows systems and networking, especially in mixed environments where things don't always behave the way the documentation says they should. Writing things down is how he makes sense of it all and hopefully saves someone else a few hours.
Sign in to your XDA account
Add Us On
Summary
Generate a summary of this story
follow
Follow
followed
Followed
Like
Like
Thread
Log in
Here is a fact-based summary of the story contents:
Try something different:
Show me the facts
Explain it like I’m 5
Give me a lighthearted recap
Prompting an LLM requires a different approach than typing a search query into Google, yet it's common for people to treat them the same. The process often goes something like this: type a sentence, get a response, then ask refining questions or supply follow-up context to guide AI toward the information you're after. If you're only asking for a simple answer, then simple questions usually work fine. But for tasks involving deeper context or multiple steps, you're normally two or three exchanges deep before the model begins to understand what you actually wanted.<br>On local LLMs specifically, which already lag behind cloud models, starting an interaction off with an ambiguous prompt puts you in a hole that's hard to dig your way out of. You can try writing clearer prompts (always a good practice), but there's still a chance that the LLM can't guess exactly what you mean. What fixed this for me was instructing local models to ask clarifying questions before attempting any non-trivial task. Now, instead of me doing any guesswork, AI lets me know if any part of my prompt needs additional clarification. Tasks that used to take several extra exchanges only take one or two now.
Related
I started self-hosting LLMs and absolutely loved it
Who needs OpenAI when your home lab can do the thinking for you?
Posts
13
By<br>Raghav Sethi
Ambiguous prompts are a local LLM's kryptonite
Especially on tasks that have multiple steps
Cloud models like Claude and ChatGPT are great at reading between the lines and inferring the underlying intent from a user's prompts. Even vague questions receive a serviceable answer surprisingly often. But that only works because cloud models have the unique advantage of being trained on enormous datasets, and the millions of questions they're asked every day also contribute to the training data. Local models don't have that luxury.<br>In my experience with Llama and Qwen models through Ollama, any ambiguity in my prompt leads to inconsistent interpretations. A simple prompt like "write a summary of this document" doesn't give the model any information about the tone or length you're expecting, or what kind of audience the summary is for, or what format it should be in. The model needs to make all those assumptions before proceeding. The chances of the result coming back exactly the way you expect it are slim. If it gives you back a paragraph when you wanted bullet points, that's more back and forth to get things right, which grows annoying.
Instructions that give you better answers
These few lines tell the model to ask instead of assume
The custom instructions are best placed inside of a Modelfile, so that it can persist across different sessions. Otherwise, you're stuck copying and pasting the instructions into every new chat. Here's what my Modelfile looks like:
FROM llama4
SYSTEM """<br>When tasked with coding, writing, editing, or summarizing, ask the user up to three targeted clarifying questions. Proceed with the task once you've received answers and understand the prompt fully. If the task is a simple factual question or conversational message, respond directly.<br>"""
Simple enough, but it actually took me a few iterations to arrive at the current instruction set. Bits like "up to three" are important, because I've had earlier versions of my custom instructions work against me in the past, with some models asking too many follow-up questions. "Targeted" is also an essential part. Before adding it, the model was asking vague questions instead of specific things it actually needs for completing its task.<br>To integrate those custom instructions into Ollama, you can paste them into a new file named "Modelfile" (no extension). Once that file is saved, run ollama create my-assistant -f Modelfile. After that, your new model with the custom instructions is ready to go with ollama run my-assistant, or from the model dropdown selector if you're using the GUI.
More questions, but quicker interactions
The extra change...