A non-coding coding agent

croottree1 pts0 comments

A non-coding coding agentA non-coding coding agent<br>We love coding agents. They can build a full-featured SaaS and potentially make you a millionaire if you leave them running overnight with the right prompt. They will burn your GPU or your budget, include a few unprompted vulnerabilities, will bloat your code risking your sanity once you start debugging it, will put emojis in your comments, and will ultimately make you question your life choices.<br>So I thought, if they are so cool — it must be interesting to build one myself and steal the fame of Anthropic.<br>But as you know, this blog is often on the edge of absurdic programming, so the agent we&rsquo;re building today is probably the first non-coding coding agent.<br>The agent would be called Socreates (yes, with a typo), and it is a Socratic agent. It will catch your mistakes, challenge your decisions, act as a rubber duck with brutal opinions — but it will never touch your code. You have to type it all yourself. And I&rsquo;ve heard many developers actually enjoyed writing code in the good old days. Some even called themselves &ldquo;coders&rdquo;.<br>A gent<br>Before diving into code, let&rsquo;s clarify what a &ldquo;coding agent&rdquo; actually is.<br>We know that LLM is just a next-token predictor. A reasoning model is the same LLM trained to spend more time on intermediate steps. An agent is a control loop that uses LLM to decide what to inspect, which tools to call, and when to stop.<br>This agentic loop is why Claude Code feels way more capable than the same model in a chat window.<br>A coding agent in its simplest form has onnly a few core jobs:<br>Gathering facts about the workspace to help the model start doing things for you (file tree, git repo state etc)<br>Executing tools (structured actions, like reading files, running commands on your machine - like remote procedure calls from the past)<br>Controlling context size – clipping long outputs, avoiding redundant tool calls, compacting context in every possible way, because LLMs have limits<br>Providing memory to persist conversation state if you restart an agent the next day<br>Some agents do more, like delegating certain tasks to bounded sub-agents, orchestrating them and doing things in parallel, but we keep it simple. One loop, four tools, no dependencies.<br>The loop<br>The loop itself is almost trivial:<br>User types a message<br>Agent sends system prompt + conversation history to an LLM<br>The LLM responds with text and/or requests some tool calls, we stop the loop if it&rsquo;s a final answer<br>If not - agent runs the tools, feeding the results back to an LLM on the next iteration<br>Go to step 3, forcing the response if it takes too long to iterate<br>This is the entire &ldquo;agent&rdquo; part. Everything else is plumbing and parsing to make various components work together (but isn&rsquo;t it the essence of modern programming?)<br>LLM<br>Models are getting better and better. Some people can afford running them locally, some can afford running in the cloud, other can afford having a job that pays for Claude API keys. To make swapping an LLM easier we define an interface for all of them, and it&rsquo;s a rather simple one:<br>type LLM interface {<br>Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)

type ChatRequest struct {<br>Messages []Message<br>Tools []Tool

type ChatResponse struct {<br>Content string<br>ToolCalls []ToolCall<br>Usage Usage

ChatRequest is a conversation history and available tools schema. ChatResponse is text content and/or structured tool calls (if the model wants to do something). Usage tracks token consumption so we can print stats after each turn and decide whether it&rsquo;s worth it. The agent wouldn&rsquo;t know if it&rsquo;s talking to a silly 7B model or DeepSeek in the cloud.<br>Both Ollama and OpenAI-compatible APIs support &ldquo;native tool calling&rdquo; via API. You send a tools array describing available functions as a JSON Schema, and the model responds with structured tool_calls. The conversation may approximately look like:<br>> system: "You are a coding companion..."<br>> user: "review my code"<br>assistant: {tool_calls: [{function: {name: "list_files", arguments: "{}"}}]}<br>> tool: "[F] main.go\n[D] pkg/" (tool_call_id: "call_1")<br>assistant: {tool_calls: [{function: {name: "read_file", arguments: "{\"path\":\"main.go\"}"}}]}<br>> tool: "[main.go: lines 1-1000 of 1000]\n 1: package main\n..." (tool_call_id: "call_2")<br>assistant: "Why did you put all 1000 lines in one file? Where are all the tests?"

Each provider needs its own HTTP client because the wire formats slightly differ:<br>Ollama (/api/chat) returns arguments as a JSON object already, so you marshal it with json.RawMessage. Token usage comes in every response as prompt_eval_count and eval_count.<br>OpenAI/DeepSeek (/v1/chat/completions) assumes that content is serialized as JSON null (not omitted) for tool-only messages; tool results without a call ID get rejected. Token usage is provided, too, but in a different format – as a standard usage object.<br>These are boring details...

agent coding rsquo tool tools code

Related Articles