Local LLMs perform better when you teach them to ask before they answer

Local LLMs perform so much better when you teach them to ask before they answer

Korbin Brown

Published May 23, 2026, 4:30 PM EDT

Korbin is a Linux system administrator who spends most of his time in a terminal figuring out how things actually work. Over the last decade he's written hundreds of articles about Linux configuration, troubleshooting weird problems, and using open-source tools in the real world.

He also works a lot with Windows systems and networking, especially in mixed environments where things don't always behave the way the documentation says they should. Writing things down is how he makes sense of it all and hopefully saves someone else a few hours.

Add Us On

Summary

Generate a summary of this story

followed

Followed

Thread

Here is a fact-based summary of the story contents:

Try something different:

Show me the facts

Explain it like I’m 5

Give me a lighthearted recap

Prompting an LLM requires a different approach than typing a search query into Google, yet it's common for people to treat them the same. The process often goes something like this: type a sentence, get a response, then ask refining questions or supply follow-up context to guide AI toward the information you're after. If you're only asking for a simple answer, then simple questions usually work fine. But for tasks involving deeper context or multiple steps, you're normally two or three exchanges deep before the model begins to understand what you actually wanted. On local LLMs specifically, which already lag behind cloud models, starting an interaction off with an ambiguous prompt puts you in a hole that's hard to dig your way out of. You can try writing clearer prompts (always a good practice), but there's still a chance that the LLM can't guess exactly what you mean. What fixed this for me was instructing local models to ask clarifying questions before attempting any non-trivial task. Now, instead of me doing any guesswork, AI lets me know if any part of my prompt needs additional clarification. Tasks that used to take several extra exchanges only take one or two now.

I started self-hosting LLMs and absolutely loved it

Who needs OpenAI when your home lab can do the thinking for you?

Posts

By Raghav Sethi

Ambiguous prompts are a local LLM's kryptonite

Especially on tasks that have multiple steps

Cloud models like Claude and ChatGPT are great at reading between the lines and inferring the underlying intent from a user's prompts. Even vague questions receive a serviceable answer surprisingly often. But that only works because cloud models have the unique advantage of being trained on enormous datasets, and the millions of questions they're asked every day also contribute to the training data. Local models don't have that luxury. In my experience with Llama and Qwen models through Ollama, any ambiguity in my prompt leads to inconsistent interpretations. A simple prompt like "write a summary of this document" doesn't give the model any information about the tone or length you're expecting, or what kind of audience the summary is for, or what format it should be in. The model needs to make all those assumptions before proceeding. The chances of the result coming back exactly the way you expect it are slim. If it gives you back a paragraph when you wanted bullet points, that's more back and forth to get things right, which grows annoying.

Instructions that give you better answers

These few lines tell the model to ask instead of assume

The custom instructions are best placed inside of a Modelfile, so that it can persist across different sessions. Otherwise, you're stuck copying and pasting the instructions into every new chat. Here's what my Modelfile looks like:

FROM llama4

SYSTEM """ When tasked with coding, writing, editing, or summarizing, ask the user up to three targeted clarifying questions. Proceed with the task once you've received answers and understand the prompt fully. If the task is a simple factual question or conversational message, respond directly. """

Simple enough, but it actually took me a few iterations to arrive at the current instruction set. Bits like "up to three" are important, because I've had earlier versions of my custom instructions work against me in the past, with some models asking too many follow-up questions. "Targeted" is also an essential part. Before adding it, the model was asking vague questions instead of specific things it actually needs for completing its task. To integrate those custom instructions into Ollama, you can paste them into a new file named "Modelfile" (no extension). Once that file is saved, run ollama create my-assistant -f Modelfile. After that, your new model with the custom instructions is ready to go with ollama run my-assistant, or from the model dropdown selector if you're using the GUI.

More questions, but quicker interactions

The extra change...

Local LLMs perform better when you teach them to ask before they answer

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits