AI agent harnesses like OpenClaw are changing LLMs, inference, and CPUs

How AI agent harnesses like OpenClaw are changing LLMs, inference, and CPUs

Jump to main content

REG AD

AI + ML

Agent harnesses, like OpenClaw, are changing how we build and run AI models

Ride your bots further by putting them in a harness

Tobias Mann

Tobias Mann

Systems Editor

Published sun 17 May 2026 // 16:30 UTC

After nearly four years and hundreds of billions burned building smarter and more capable models, folks understandably would like to see them do something more than run a chatbot. In this respect, OpenClaw served like blood in the water, demonstrating that, in spite of its seemingly endless supply of security flaws, LLMs really can be used to automate complex tasks. Since then, you've probably noticed the term "harness" coming up more frequently to describe agentic AI frameworks, and for good reason. You don't need a harness to interact with a chatbot – local tools like Ollama send API calls directly to the LLMs – but to do today's advanced work, they are essential.

REG AD

On their face, AI harnesses are just a bit of code that wraps around an LLM's API endpoint, orchestrates tool calls, and manages context. OpenClaw, Claude Code, Codex, and Pi Coding Agent are all examples of code-focused harnesses you may already be familiar with.

REG AD

As simple as all this sounds, harnesses are changing the way we think about everything from training new models to how we build and run them at scale. LLM inference on its own is pretty dumb – not the models so much as the way we interact with them. The OpenAI-compatible API calls that have become the de facto standard are transactional. With most early chatbots, you made a request and the API would supply a response. A harness, by comparison, orchestrates those API calls, breaking down one request into multiple. If you were to ask a code agent to build an app that parses logs, the harness might make one request to plan things out, another to review the log directory, a third to generate and execute that code in an interpreter, and a fourth to debug and fix any errors. This multi-step loop would continue until the work is done or the harness cuts it short to ask for user input. At least for coding, these harnesses are getting good enough to be useful. In fact, a harness may have a bigger impact on whether the code assistant will be successful than the model itself. Even Qwen3.6-27B, a small-to-medium-sized LLM, proved to be a surprisingly effective alternative to larger paid models when paired with harnesses like Anthropic’s Claude Code or Cline. And yes, if you didn’t know, Claude Code works with any model you like. In fact, the realization that small models with well-designed harnesses can now automate complex tasks has contributed to a shortage of Mac Minis, as AI enthusiasts race to self-host OpenClaw and LLMs on them. Changing the way we build models Training dominated the first two years of the AI boom. OpenAI, Google, Microsoft and others raced to build smarter models using as much data as they could harvest.

REG AD

But by the end of 2024, the payoff of building ever larger models started to taper off, as the extra parameters only engendered small gains in intelligence. DeepSeek R1 brought “reasoning” models and test-time scaling to the mainstream. To be clear, these models don’t actually reason, but instead trade time and tokens for higher quality answers and a lower propensity to make stuff up (aka "hallucinate," although we at El Reg try to avoid anthropomorphizing AI). It wasn’t the first. OpenAI’s o1 beat them to it, but R1 was the first widely adopted open weights model that used reinforcement learning (RL) to teach the model new skills, like chain-of-thought reasoning. Over the past year, agentic code assistants have steadily gained traction. Consequently, people are increasingly using RL to teach models to use the tools and resources that agent harnesses expose to them. If you look at many of the recent model releases on Hugging Face, you’ll notice a strong emphasis on agentic tool calling and long-context reasoning. If you want a model to work effectively with an agent harness, it needs to execute tool calls reliably. And since those tool calls can return large quantities of information, you also need the model not to lose track of that information. While these qualities make for better agentic models, they also require a very different set of hardware. CPUs take center stage Compute to run these agent harnesses is in high demand. After living in the shadow of high-end GPUs and AI accelerators for the past few years, CPUs are back in the limelight.

REG AD

Intel Xeon processors are selling faster than Intel can make them. Meta is buying up every chip it can get from Arm and Nvidia, and renting boatloads of Amazon’s Graviton CPUs while it awaits delivery. This is happening because agent harnesses don’t run on GPUs. Even with enough CPU cores to execute these tasks at scale, the number of requests is also reshaping the...

AI agent harnesses like OpenClaw are changing LLMs, inference, and CPUs

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast