What We Learned from One Year of Building Production Agents | Strands Agents SDK
Skip to content SDKs<br>PY sdk-python ↗ TS sdk-typescript ↗<br>Organizations<br>strands-agents ↗ strands-labs ↗
Python
TypeScript<br>Select theme DarkLightAuto
All posts<br>We turned 1 year old!
Strands Agents launched over a year ago thanks to an internal effort by AWS engineers building a network troubleshooting agent. They didn’t use a heavy-duty framework or write enormous amounts of workflow boilerplate. They essentially wired up a system prompt, a Claude 3 model, and tools, resolving 80% of network root causes. Today that’s branded as an “agent harness”.
Our philosophy for keeping architecture minimal helped teams across AWS and beyond ship production agents handling customer traffic at scale. There are a lot of lessons our engineers learned after open sourcing this framework and hitting 25 million downloads. Here are the key ones:
Workflow boilerplate in agents can become easily outdated
When Strands launched in May 2025, the most advanced models had a 200k context window. The agents we saw developers build were usually chatbots connected to a knowledge base or a workflow that classified data in batches. Agent frameworks gave tons of scaffolding to optimize these tasks. But what if a model got better? You potentially wound up with a lot of technical debt refactoring the agent.
For example, take this Sonnet 3.7 agent that reviews CloudWatch alarms across AWS accounts. We saw a lot of customers build something like this with another agent framework:
from agentframework import Agent, tool, Graph, GraphNode, GraphEdge, SlidingWindowConversationManager
@tool
def list_cloudwatch_alarms(state_filter: str = "ALARM") -> dict:
"""List CloudWatch alarms filtered by state."""
...
@tool
def get_alarm_logs(alarm_name: str) -> dict:
"""Get CloudWatch Logs related to an alarm from the past 7 days."""
...
@tool
def format_alarm_report(raw_data: str) -> dict:
"""Format alarm data into a human-readable report."""
...
fetch_agent = Agent(
model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
system_prompt="List all alarms currently in ALARM state.",
tools=[list_cloudwatch_alarms],
conversation_manager=SlidingWindowConversationManager(window_size=10, per_turn=True),
logs_agent = Agent(
model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
system_prompt="For each alarm provided, get its logs from the past 7 days to determine what's been happening.",
tools=[get_alarm_logs],
conversation_manager=SlidingWindowConversationManager(window_size=10, per_turn=True),
formatter_agent = Agent(
model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
system_prompt="Format the alarm data and logs into a clean report grouped by severity. Include relevant log patterns for each alarm.",
tools=[format_alarm_report],
conversation_manager=SlidingWindowConversationManager(window_size=20, per_turn=True),
graph = Graph(
nodes={
"fetch": GraphNode(agent=fetch_agent),
"logs": GraphNode(agent=logs_agent),
"format": GraphNode(agent=formatter_agent),
},
edges=[
GraphEdge(source="fetch", target="logs"),
GraphEdge(source="logs", target="format"),
],
result = graph.invoke({"input": "What alarms are firing right now?"})
dict: """List CloudWatch alarms filtered by state.""" ...@tooldef get_alarm_logs(alarm_name: str) -> dict: """Get CloudWatch Logs related to an alarm from the past 7 days.""" ...@tooldef format_alarm_report(raw_data: str) -> dict: """Format alarm data into a human-readable report.""" ...fetch_agent = Agent( model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0", system_prompt="List all alarms currently in ALARM state.", tools=[list_cloudwatch_alarms], conversation_manager=SlidingWindowConversationManager(window_size=10, per_turn=True),)logs_agent = Agent( model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0", system_prompt="For each alarm provided, get its logs from the past 7 days to determine what's been happening.", tools=[get_alarm_logs], conversation_manager=SlidingWindowConversationManager(window_size=10, per_turn=True),)formatter_agent = Agent( model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0", system_prompt="Format the alarm data and logs into a clean report grouped by severity. Include relevant log patterns for each alarm.", tools=[format_alarm_report], conversation_manager=SlidingWindowConversationManager(window_size=20, per_turn=True),)graph = Graph( nodes={ "fetch": GraphNode(agent=fetch_agent), "logs": GraphNode(agent=logs_agent), "format": GraphNode(agent=formatter_agent), }, edges=[ GraphEdge(source="fetch", target="logs"), GraphEdge(source="logs", target="format"), ],)result = graph.invoke({"input": "What alarms are firing right now?"})">
A lot of developers thought best practices meant using a graph workflow that scaffolded each step into a separate agent. For simpler use cases like this, that seemed over-engineered to us. This type of scaffolding...