Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
Log In<br>Sign Up
Back to Articles<br>a]:hidden">
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
Enterprise Article Published<br>June 23, 2026
Upvote 35
+29
Anupama Murthi anupamamurthi Follow
ibm-research
Hamid Adebayo harmedox Follow
ibm-research
Sami Marreed samimarreed Follow
ibm-research
Praveen Venkateswaran praveenv Follow
ibm-research
Asaf Adi AsafAdi Follow
ibm-research
Ayhan Sebin ayhansebin Follow
ibm-research
Star CUGA on GitHub
TL;DR — Building an agent is mostly plumbing: tools, state, guardrails, scaling from one agent to many. CUGA (pip install cuga), short for Configurable Generalist Agent, the Agent Harness for the Enterprise from IBM handles that, so you write just a tool list and a prompt. We built two-dozen single-file apps to prove it. Read one end to end here, then see how the same agent runs sovereign and governed in production without a rewrite.
Most agentic apps start with a week of plumbing before the agent does anything useful. You pick a framework, wire up a model client, write tool adapters, build some way to stream state to a UI, and somewhere in there you also decide what the agent is actually for. The interesting part arrives last.
CUGA inverts that. It's the open-source agent harness from IBM that handles the planning, the execution loop, the tool calls, and the state plumbing for you. What's left is the part that's actually yours: which tools the agent can reach, and what you tell it to do. To show what that feels like in practice, we built cuga-apps: two dozen small, working apps, each a single FastAPI file wrapping one CugaAgent, from a movie recommender to an IBM Cloud architecture advisor. They exist to be read and copied. You can click through the live gallery.
This article walks through one of them, names what the harness takes off your plate, and shows where the same code goes when you need it governed for production. No new framework to learn first. If you've written a FastAPI route, you can read every line.
Why a harness, not a framework
The fair question to ask of anything in this space is what it saves you from writing. CUGA's answer: the orchestration around a model that you'd otherwise rebuild every time.
It plans before it acts, then executes with a mix of tool calls and generated code (CodeAct). On a long task that runs twenty steps, the thing that breaks most agents is losing track of intermediate results and re-deriving them (often wrong) on the next turn; CUGA holds that state and runs a reflection step that can catch a bad call and re-plan instead of barreling ahead. That machinery is why it has topped agent benchmarks like AppWorld and WebArena rather than something you tune by hand.
You also set the cost/latency tradeoff from config rather than code: Fast, Balanced, and Accurate reasoning modes, with code execution in whatever sandbox you trust (local, Docker/Podman, or E2B cloud). Same agent definition, different dial. That dial matters more than it sounds. Most harnesses assume a frontier model sits underneath and lean on it to recover when a plan goes sideways; CUGA does that work itself. The planning, the reflection step, the variable-tracking that keeps a long run on course — that's the harness carrying load the model would otherwise have to, which is what lets a smaller open-weight model hold up where it normally wouldn't. It's why the hosted apps run on gpt-oss-120b rather than a frontier API. Running the biggest model you can call is the usual bet; CUGA's is that a smaller open one is enough.
None of the individual pieces is unique to CUGA. What's different is that they come pre-assembled, so you configure them instead of wiring them together. The API you touch is small — build a CugaAgent with a tool list and a prompt, then await agent.invoke(...). Everything below that line is the harness.
Concretely, that's interchangeable tools (OpenAPI, MCP, and LangChain functions all bind the same way), long-horizon planning with variable management and self-correction (the machinery behind #1 on AppWorld from 07/25 - 02/26 and WebArena from 02/25 - 09/25), declarative guardrails, multi-agent delegation over A2A , Docling-powered RAG, and one-env-var provider switching (pip install cuga, then OpenAI, watsonx, Ollama, and more) — each something you'd otherwise build yourself. The first word of the name does the work: Configurable; the hard parts are handled, so your job is just the task.
One app, start to finish
Here's the IBM Cloud advisor — an agent that recommends real IBM Cloud services for an architecture. The whole thing fits in one file: a main.py with the agent factory, the tools, and the prompt, plus a small UI.
The whole agent is this:
def make_agent():<br>from cuga import CugaAgent<br>from _llm import create_llm
return...