69 Ways to F*** Up Your Deploy | Sensemaking by Shortridge
69 Ways to F*** Up Your Deploy
Co-authored by Kelly Shortridge and Ryan Petrich
We hear about all the ways to make your deploys so glorious that your pipelines poop rainbows and services saunter off into the sunset together. But what we don’t see as much is folklore of how to make your deploys suffer.1
Where are the nightmarish tales of our brave little deploy quivering in their worn, unpatched boots – trembling in a realm gory and grim where pipelines rumble towards the thorny, howling woods of production? Such tales are swept aside so we can pretend the world is nice (it is not).
To address this poignant market painpoint, this post is a cursed compendium of 69 ways to fuck up your deploy. Some are ways to fuck up your deploy now and some are ways to fuck it up for Future You. Some of the fuckups may already have happened and are waiting to pounce on you, the unsuspecting prey. Some of them desecrate your performance. Some of them leak data. Some of them thunderstrike and flabbergast, shattering our mental models.
All of them make for a very bad time.
We’ve structured this post into 10 themes of fuckups plus the singularly horrible fuckup of manual deploys. For your convenience, these themes are linked in the Table of Turmoil below so you can browse between soul-butchering meetings or existential crises. We are not liable for any excess anxiety provoked by reading these dastardly deeds… but we like to think this post will help many mortals avoid pain and pandemonium in the future.
The Table of Turmoil:
Identity Crisis
Loggers and Monitaurs
Playing with Deployment Mismatches
Configuration Tarnation
Statefulness is Hard
Net-not-working
Rolls and Reboots
Disorganized Organization
Business Illogic
The Audacity of Spacetime
Manual Deploys
Identity Crisis
Permissions are perhaps the final boss of Deployment Dark Souls; they are fiddly, easily forgotten, and never forgiven by the universe.
1. Allow all access.
“Allow all access” is simple and makes deployment easy. You’ll never get a permission failure! It makes for infinite possibilities! Even Sonic would wonder at our speed!
And indeed, dear reader, what wonder allow * inspires… like a wonder for what services the app actually talks to and what we might need to monitor; a wonder for what data the app actually reads and modifies; a wonder for how many other services could go down if the app misbehaved; and a wonder for exactly how many other teams we might inconvenience during an incident.
Whether for quality’s sake or security’s, we should not trade simplicity today for torment tomorrow.
2. Keys in plaintext.
Key management systems (KMS) are complex and can be ornery. Instead of taming these complex beasts – requiring persistence and perhaps the assistance of Athena herself to ride the beast onward to glory – it can be tempting to store keys in plaintext where they are easily understandable by engineers and operators.
If anything goes wrong, they can simply examine the text with their eyeballs. Unfortunately, attackers also have eyeballs and will be grateful that you have saved them a lot of steps in pwning your prod. And if engineers write the keys down somewhere for manual use in an “emergency” or after they’ve left the company… thoughts and prayers.
3. Keys aren’t in plaintext, they’re just accessible to everyone through the management system.
You’ve already realized storing keys in plaintext is unwise (F.U. #2) and upgraded to a key management system to coordinate the dance of the keys. Now you can rotate keys with ease and have knowledge of when they were used! Alas, no one set up any roles or permissions and so every engineer and operator has access to all of the keys.
At least you now have logs of who accessed which keys so you can see who possibly leaked or misused a key when it happens, right? But how useful are those logs when they are simply a list of employees that are trusted to make deploys or respond to incidents?
4. No authorization on production builds.
The logical conclusion of fully automated deployments is being able to push to production via SCM operations (aka “GitOps”). Someone pushes a branch, automation decides it was a release, and now you have a “fun” incident response surprise party to resolve the accidental deploy.
One option is to enforce sufficient restrictions on who can push to which branches and in what circumstances. Or, you can go on a yearlong silent meditation retreat to cultivate the inner peace necessary to be comfortable with surprise deployments.
The common “mitigation” “plan” is to only hire devs who have a full understanding of how git works, train them properly2 on your precise GitOops workflow, and trust that they’ll never make a mistake… but we all know that’s just your lizard brain’s...