Safetykit: A small collection of safety demos for human-in-the-loop scripts

safetykit – Jonathon Belotti [thundergolfer] --> ⛳️"> safetykit May 17, 2026 2 minute read

Tags left in place in a powerplant after it was shut down, decommissioned, and abandoned. For some reason I’m a particularly cautious software engineer. It has its downsides, but one concrete benefit is that when I slip (everyone does), those slips are less likely to cause an incident. Over time I’ve seen slips contribute to many serious incidents, and have come to properly value the role of straightforward safety mechanisms in tools. The most obvious and commonly used mechanism is --dry-run, but there are many more safety mechanisms you can introduce into a semi-automated system. I’ve made a small Python gist called safetykit to collect these mechanisms. It is a set of runnable demonstrations that advance a simple idea: production scripts should have seatbelts. The gist exists to make common safety techniques concrete. Instead of saying “be careful with destructive scripts”, it shows a few ways a script can slow down, explain itself, ask for help, recover from interruption, and leave evidence behind. dryrun Separates planning from execution by printing what would happen first, then shows the wet run committing the same delete.

confirm Requires a typed confirmation before deleting a selected file, using a stronger guard than a reflexive y/n prompt. One cautionary reference is Leveson and Turner’s Therac-25 accident investigation, where repeated proceed prompts helped train operators into dangerous reflexes.

pause Inserts a deliberate delay before a scary action so the human operator has time to interrupt.

abort Writes through a temporary file and atomic replace so cancellation does not leave a corrupt partial output.

undo Moves a file into quarantine first, waits briefly for a cancel key, and only commits the delete if the user does not undo it.

feedback Feedback is central to safety. When humans or machines act in the world, they need feedback to tell whether their actions are safe and whether their internal model of the world corresponds to reality. This demo ramps CPU pressure on one logical core while printing color-coded progress, then stops cleanly when the user notices the overload and interrupts.

audit Emits a JSON Lines audit trail with hash chaining so a run has a durable record of what happened.

two_person Requires a second independently run script, with a shared secret, before the initiating script proceeds. It’s a translation of the two-person rule used in nuclear weapons management. This software version has serious design flaws, but it was a fun exercise to sketch out.

None of these techniques is exotic. That is the point. Most scripts can be made much safer with small, boring additions.

Thanks for reading. Follow me on Github or Twitter.

More to read Failure numbers every programmer should know (May 14, 2026)

Me and Brendan Gregg vs. the 1 billion row challenge: a worklog (Part 1) (Feb 16, 2025)

Why does an NVIDIA H100 80GB card offer 85.52 GB? (Feb 2, 2025)

Beyond ‘latency numbers every programmer should know’ (Jan 18, 2025)

Safetykit: A small collection of safety demos for human-in-the-loop scripts

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast