DCaulfield
Save Your Team 100s of Hours
How is time wasted in your team?
Teams require constant maintenance, updates and improvements.<br>Otherwise, they become stagnant and inefficient.<br>To become hyper-efficient, each team must analyse how they work.<br>One key metric is to understand how much time your team wastes on repetitive tasks.<br>While software developers are excellent at automating away their own problems, they often forget to focus on the team's problems.
If you analyse your team's time and value output, you will see many areas to improve.<br>Obvious areas are poorly written bugs, too many meetings, poor communication with colleagues and so forth.<br>But there is a category of work that is often overlooked - repetitive team tasks.<br>Repetitive team tasks is any task that the team needs to perform on a regular basis.
Ask your team:
What tasks do we do on a regular basis? e.g. Gather logs from a customer system.
What topics do we constantly explain? e.g. How to install MyComponent on a Linux server.
What issues regularly crop up? e.g. A bug with MyComponent which needs a workaround.
What support do we regularly give? e.g. Recovering MyComponent in the event of a failure.
These examples are straightforward for the team member who knows what to do.<br>However, team members are often out of office or unavailable to help.<br>When this happens, something as simple as resolving a code conflict is difficult for someone who does not have the know how.<br>This results in wasted cognitive expense as your team attempts to solve a problem that was previously solved.<br>Your team's valuable time for new features is replaced with old, repetitive work.
We need to reduce this waste.
What are Runbooks?
Runbooks document your team's knowledge for the future.<br>A runbook is a step-by-step recipe to solve a problem.<br>With runbooks, your team can rely less on each other's availability.<br>Instead, when a problem arises, they can search through a database of runbooks for a solution.
Runbooks contain clear and concise steps.<br>They can be "How-To" guides, tutorials or any step-by-step instructions.<br>Runbooks should not have walls of text.<br>This is better left to blog posts and articles.<br>Instead, sentences should be brief and bulleted.
What are the benefits of runbooks?
Most people don't write.<br>It's not surprising - we're not paid to write.<br>But writing down solutions to problems has a host of benefits.<br>When teams document their solutions, they achieve hyper-efficiency quickly.<br>With a runbook, the writer must deep dive into their problem.<br>They must write in such a way that other people can read it quickly and efficiently.
After 6 months of documenting solutions, my team has written 114 runbooks.<br>The top runbook has been viewed 125 times by our team of only 8 people.<br>This top runbook is an extensive 'How-To' document which explains gotchas and workarounds when installing a dev environment.<br>I estimate that each viewing saves about 30 mins.<br>Therefore, this one runbook has saved the team 60 hours.<br>If we account for another 113 runbooks, our team has easily saved 100s of hours .
What kind of runbooks can I create?
Create runbooks to solve specific problems.<br>Not everything should be put into a runbook.<br>Long topics do not facilitate quick and easy searching.<br>Long form writing is also difficult to give quick information to the reader.<br>So, what should go into a runbook?
Bug Runbooks
An in-depth analysis to my team's most difficult bugs has proved useful on multiple occasions.<br>When I create a Bug Runbook, I ask myself: Who will need this in the future?<br>Very often, a new bug is opened against our team where someone will say 'Hey I've seen this one before. Anyone remember how we solved it?'.<br>When this happens, I need quick information about the previous bug.
Without runbooks, I need to go through the team's history of Jira tickets and search for keywords until I find it.<br>Even when I find the old bug, if the assignee closed it in a hasty manner then a lot of information will be missing.<br>Root causes are nowhere to be found. I am lucky if the assignee has even linked their code fix.<br>On the other hand, let's say the bug ticket has a link to a runbook with the following information:
Key phrases
RuntimeError in /path/to/logs.
Problem Statement
User's login page failed to load.
Steps to Diagnose
Viewed logs x and y.
Checked the VM resources using df -h.
Discovered the VM directory /var was full due to large logs.
Root Cause
Customer set their logs to MESSAGE for some investigation then forgot to change back to INFO.
Steps to Fix
Remove the log directory /path/to/logs to bring the VM back up.
Steps to Prevent
Create an alarm to detect when the VM resources become too low.
Other things to note
This happened to a system under heavy consumer load over the course of 2 weeks.
I now have a detailed root cause to a similar ticket.<br>Instead of starting my investigation with no information, I have a possible root cause to quickly check if it solves...