How AI Agents Work: An Architectural Deep Dive

jackalxyz1 pts0 comments

How AI Agents Actually Work: An Architectural Deep Dive | DeepResearch NinjaSkip to main content<br>Table of Contents<br>How AI Agents Actually Work: An Architectural Deep Dive An analysis of the patterns, infrastructure, and trade-offs behind the systems that have redefined what large language models can do Research Technology AI Agents LLM ReAct Tool Use Multi-Agent Systems Observability Software Engineering Claude Code<br>Executive Summary<br>The term &ldquo;AI agent&rdquo; has become one of the most overloaded in modern tech, but at its core it refers to a simple pattern: a large language model (LLM) connected to external tools and operating in a loop where it reasons about what to do, calls a tool, observes the result, and repeats until the task is complete. This pattern, known as ReAct after the 2022 paper &ldquo;Synergizing Reasoning and Acting in Language Models,&rdquo; has become the foundation of every production AI agent today.<br>What makes agents work well is not the model itself but the surrounding infrastructure: how context windows are managed across thousands of tool calls, how tools are designed for non-deterministic consumers, and how safety boundaries are enforced. A widely-circulated claim has become the defining statistic in this space: Claude Code&rsquo;s leaked source code revealed only about 1.6% of its codebase constitutes AI decision logic, with the remaining 98.4% being operational infrastructure [3]. This figure is disputed: critics argue it misinterprets how the Liu et al. paper categorizes different kinds of code, and that the distinction between &ldquo;AI logic&rdquo; and &ldquo;infrastructure&rdquo; is itself an interpretive choice rather than a fact about the code. Regardless of the exact percentage, the underlying intuition holds: production agent systems are dominated by operational engineering.<br>The architecture has evolved through several identifiable layers:<br>The ReAct loop (Thought → Action → Observation) interleaves reasoning traces with external actions so the model can induce, track, and update plans while interacting with real data sources.<br>Tool use connects the model to APIs, files, databases, and other systems. The key insight is that tools must be designed specifically for agents, i.e., non-deterministic consumers, not just wrapped as API endpoints.<br>Memory comes in two forms: short-term (in-context learning bounded by the context window) and long-term (external vector stores via Retrieval-Augmented Generation).<br>Planning and composition patterns (orchestrator-workers, evaluator-optimizer, parallelization) allow agents to handle complex multi-step tasks.<br>Multi-agent systems delegate subtasks to specialized workers, trading exponential token costs for dramatic gains in capability on open-ended problems.<br>Observability (distributed tracing via OpenTelemetry GenAI semantic conventions, infinite loop detection, cost attribution, and session replay) has emerged as a critical operational layer. Without it, debugging non-deterministic agent behavior is nearly impossible.<br>The most important finding from this research is that agent architecture has converged around a small set of well-understood patterns. The competition between framework vendors (LangChain, CrewAI, OpenAI&rsquo;s SDKs, Anthropic&rsquo;s Agent SDK) is largely about ergonomics. Real engineering effort goes into context management, tool design, and reliability, areas where the best practitioners have accumulated significant domain knowledge.<br>A second important finding is that the gap between agent benchmarks and real-world performance is much wider than commonly assumed: 95% of enterprise AI pilots deliver zero measurable ROI [25], and roughly half of SWE-bench-passing PRs would not be merged by real maintainers [17]. The field&rsquo;s primary bottleneck is now evaluation methodology, not model capability [21].<br>A third finding: the &ldquo;agent winter&rdquo; critique has empirical backing. Enterprise adoption has been slower and more cautious than early hype suggested, with Gartner predicting 40% of agentic AI projects will be scrapped by 2027, citing &ldquo;rising costs, unclear business value, and integration complexity,&rdquo; and PwC identifying integration complexity (67%), lack of monitoring (58%), and unclear escalation paths (52%) as the top causes of pilot failure.<br>1. Definitions: What Is an &ldquo;Agent&rdquo; and How Does It Differ from Other AI Systems?<br>The word &ldquo;agent&rdquo; has a long history in computer science. The classic definition from Russell and Norvig&rsquo;s Artificial Intelligence: A Modern Approach describes an agent as anything that perceives its environment through sensors and acts upon that environment through actuators. This is a broad definition; a thermostat is technically an agent.<br>In the modern AI literature, the term has narrowed. Anthropic defines agents as &ldquo;systems where LLMs dynamically direct their own processes and tool usage,&rdquo; distinguishing them from workflows :...

agent ldquo rdquo agents systems tool

Related Articles