I Taught an AI to Be Our On-Call Engineer

How I Taught an AI to Be Our On-Call Engineer | Pipedrive R&D BlogSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

Pipedrive R&D Blog

Stories from Pipedrive’s product, engineering, design, marketing, localization, and many other teams

Scooby: How I Taught an AI to Be Our On-Call Engineer

Aleksandr Smirnov

15 min read· Just now

Listen

Press enter or click to view image in full size

There’s a specific kind of frustration that every on-call engineer knows. It’s 11 pm, an alert fires, and you open five dashboards in a panic. You’re pasting log queries from memory, squinting at a Grafana panel that may or may not be the right one, or trying to find the relevant playbook somewhere in Confluence. You know the answer is somewhere in there. You just have to go find it across a dozen different tools, while also trying to figure out if this is even a real issue or just noise. That was the itch I wanted to scratch. A weekend in Claude Code A few months ago, a colleague, Stephane Moser, basically encouraged me to try a CLI tool built for exactly this problem: an AI-powered root cause analysis tool called HolmesGPT. Point it at an alert, let the AI do the dashboard-hopping, and get an explanation back. I really wanted to love it. But the developer experience just wasn’t there: manual token creation, YAML config changes, environment-specific tweaks, etc. And after going through all the trouble of getting it ready to run, its results were not that impressive. It had no idea how our infrastructure was set up. No domain knowledge, no institutional context. Technically, the tool did what it promised. Practically, it lacked the context that would make its answers useful to us. And if a tool was going to understand our specific architecture, we were going to have to build it. I was spending most of my day inside Claude Code anyway. One weekend, I opened it and asked: “Can we turn this into a Claude Code skill?” The result was a plugin I called Scooby (Warner Bros., please don’t sue me). My first move was connecting our Internal MCP server to it, which gave access to Grafana, Loki, Tempo, company metadata, deployment history, Slack, Confluence, and more. Luckily, all our services send logs to Loki, including Kubernetes events, so Scooby can map the entire lifecycle of a pod just from logs without ever needing kubectl. That’s what made it zero friction: no tokens, no YAML, just the tools already there. I tested it on boring questions like “Why is this pod crashlooping in the test region?” and it got there. The basics worked. But it was a smart stranger: capable in the abstract, completely ignorant of Pipedrive. I had a tool that could technically do the job. I needed one I could trust in production. Teaching, not writing playbooks Every company has its own dialect, its own tribal knowledge. Ours has grown over years of infrastructure decisions, naming conventions, routing layers, and logging standards that made sense when someone made them, and are now just how things work . A generic AI investigator doesn’t know any of that. It doesn’t know that different environments have different Loki data sources. It doesn’t know our service naming patterns or which metadata APIs to call for company-level context. It doesn’t know that what shows up in one dashboard is actually fed by a completely different system than it appears. Scooby could query things, but access to tools isn’t the same as knowing how to use them. It still had no instincts about where to look , in what order , or what the results actually meant in our context. It was the same gap I’d hit with the off-the-shelf tool, just one layer deeper: the friction was gone, but the ignorance wasn’t. The obvious move is to start writing rules: if alert type X, query datasource Y. But that path leads to an ever-growing playbook that covers every incident you’ve already seen and is silent on anything new. You still need to tell the system where your data sources live, which labels matter, and how to route to the right domain. But there’s a difference between telling it where the data lives and telling it what every answer should be. I didn’t want to write playbooks. I wanted to teach. Our alert channels already contained months of history: alerts, discussion threads, and engineer explanations of what went wrong and how it was fixed. That became the training material. The methodology I landed on works like this. Take a solved alert, one where we already know the root cause. Feed Scooby just the alert description, nothing else. Watch what it does. When it fails, don’t edit the skill file yourself. Instead, give Scooby the actual resolution. Then ask Claude a specific question: “Looking at what you did, how would you rewrite your own instructions so you’d solve this properly next time?” After enough rounds of this, I stopped feeding alerts one by one. I pointed Scooby at the channel and told it to find those resolved threads itself. Same...

I Taught an AI to Be Our On-Call Engineer

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast