A non-coding coding agentA non-coding coding agent<br>We love coding agents. They can build a full-featured SaaS and potentially make you a millionaire if you leave them running overnight with the right prompt. They will burn your GPU or your budget, include a few unprompted vulnerabilities, will bloat your code risking your sanity once you start debugging it, will put emojis in your comments, and will ultimately make you question your life choices.<br>So I thought, if they are so cool — it must be interesting to build one myself and steal the fame of Anthropic.<br>But as you know, this blog is often on the edge of absurdic programming, so the agent we’re building today is probably the first non-coding coding agent.<br>The agent would be called Socreates (yes, with a typo), and it is a Socratic agent. It will catch your mistakes, challenge your decisions, act as a rubber duck with brutal opinions — but it will never touch your code. You have to type it all yourself. And I’ve heard many developers actually enjoyed writing code in the good old days. Some even called themselves “coders”.<br>A gent<br>Before diving into code, let’s clarify what a “coding agent” actually is.<br>We know that LLM is just a next-token predictor. A reasoning model is the same LLM trained to spend more time on intermediate steps. An agent is a control loop that uses LLM to decide what to inspect, which tools to call, and when to stop.<br>This agentic loop is why Claude Code feels way more capable than the same model in a chat window.<br>A coding agent in its simplest form has onnly a few core jobs:<br>Gathering facts about the workspace to help the model start doing things for you (file tree, git repo state etc)<br>Executing tools (structured actions, like reading files, running commands on your machine - like remote procedure calls from the past)<br>Controlling context size – clipping long outputs, avoiding redundant tool calls, compacting context in every possible way, because LLMs have limits<br>Providing memory to persist conversation state if you restart an agent the next day<br>Some agents do more, like delegating certain tasks to bounded sub-agents, orchestrating them and doing things in parallel, but we keep it simple. One loop, four tools, no dependencies.<br>The loop<br>The loop itself is almost trivial:<br>User types a message<br>Agent sends system prompt + conversation history to an LLM<br>The LLM responds with text and/or requests some tool calls, we stop the loop if it’s a final answer<br>If not - agent runs the tools, feeding the results back to an LLM on the next iteration<br>Go to step 3, forcing the response if it takes too long to iterate<br>This is the entire “agent” part. Everything else is plumbing and parsing to make various components work together (but isn’t it the essence of modern programming?)<br>LLM<br>Models are getting better and better. Some people can afford running them locally, some can afford running in the cloud, other can afford having a job that pays for Claude API keys. To make swapping an LLM easier we define an interface for all of them, and it’s a rather simple one:<br>type LLM interface {<br>Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)
type ChatRequest struct {<br>Messages []Message<br>Tools []Tool
type ChatResponse struct {<br>Content string<br>ToolCalls []ToolCall<br>Usage Usage
ChatRequest is a conversation history and available tools schema. ChatResponse is text content and/or structured tool calls (if the model wants to do something). Usage tracks token consumption so we can print stats after each turn and decide whether it’s worth it. The agent wouldn’t know if it’s talking to a silly 7B model or DeepSeek in the cloud.<br>Both Ollama and OpenAI-compatible APIs support “native tool calling” via API. You send a tools array describing available functions as a JSON Schema, and the model responds with structured tool_calls. The conversation may approximately look like:<br>> system: "You are a coding companion..."<br>> user: "review my code"<br>assistant: {tool_calls: [{function: {name: "list_files", arguments: "{}"}}]}<br>> tool: "[F] main.go\n[D] pkg/" (tool_call_id: "call_1")<br>assistant: {tool_calls: [{function: {name: "read_file", arguments: "{\"path\":\"main.go\"}"}}]}<br>> tool: "[main.go: lines 1-1000 of 1000]\n 1: package main\n..." (tool_call_id: "call_2")<br>assistant: "Why did you put all 1000 lines in one file? Where are all the tests?"
Each provider needs its own HTTP client because the wire formats slightly differ:<br>Ollama (/api/chat) returns arguments as a JSON object already, so you marshal it with json.RawMessage. Token usage comes in every response as prompt_eval_count and eval_count.<br>OpenAI/DeepSeek (/v1/chat/completions) assumes that content is serialized as JSON null (not omitted) for tool-only messages; tool results without a call ID get rejected. Token usage is provided, too, but in a different format – as a standard usage object.<br>These are boring details...