Laurence Tratt: Test-case Reducers Are Underappreciated Debugging Tools
Home
> Blog
Home> Blog
email<br>Bluesky<br>Mastodon<br>Twitter
Bluesky<br>Twitter
Test-case Reducers Are Underappreciated Debugging Tools
RSS feed: whole site or just the blog
Recent posts
Test-case Reducers Are Underappreciated Debugging Tools
Retrofitting JIT Compilers into C Interpreters
PLISS 2026
Upcoming Talk in London
Async and Finaliser Deadlocks
What Context Can Bring to Terminal Mouse Clicks
Why Firsts Matter
LLM Inflation
Comparing the Glove80 and Maltron keyboards
The LLM-for-software Yo-yo
Blog archive
Test-case reducers are less well known than they should be, and those who are<br>aware of them don’t always realise the variety of ways we can use – perhaps<br>even abuse! – them. In this post, I’m going to explore some of the things I’ve learnt while using<br>these wonderful tools. I’ll start at the basics, because the idea is so simple<br>that it can be hard to believe it works.
I’ll then work my way up to a deeper surprise. Test-case reducers try to reduce<br>the length of an input, but we can force them to take into account additional<br>factors such as how often an error occurs, or the number of instructions<br>executed. Depending on the problem you’re trying to debug, this can<br>make a huge difference to the real-world effectiveness of test-case reducers.
I don’t expect that anything in this post will surprise experts, but since I<br>learnt this stuff the hard way, perhaps others will benefit from having some of<br>it in one place.
Test-case reduction
Imagine we have written a program that crashes on a large input, and we don’t<br>know what part of the input causes the crash: what can we do?
Most of us will probably start with debugging our program using classic techniques, gradually<br>moving from the quick and dirty (printf) to the more principled (debuggers) to – if<br>we’re desperate and experienced – “exotic” tools (sanitisers, valgrind, etc).<br>Every programmer ends up with a set of debugging tools and<br>techniques they reach for, some more effective1 than others.
One technique that is used less often than it probably should be is reducing<br>the size of the input. In nearly all cases, the smaller an input is, the easier<br>it is for us to work out why it’s leading to problems.
Reduction can be manual. We can load an input into a text editor, remove a portion of it,<br>and then see if the new, smaller, input still causes a crash.
Unsurprisingly, as easily-bored humans with limited vision, we tend to miss<br>many opportunities for reduction when we do so manually. It’s also often the<br>case that deleting part of an input stops the program crashing in the way we<br>were investigating: the reduced program might run to completion, or throw a different, correct and<br>expected, error on the new input. Even worse, at some point one realises that deleting portion A<br>of the input doesn’t achieve the effect we want, but deleting disjoint portions A<br>and B does: how big is the search space of deletions?! A Sisyphean future<br>beckons.
Test-case reducers
Fortunately, there are tools that automate the process of reducing test cases:<br>test-case reducers2. These take a program, an input, and an interestingness<br>test. The test-case reducer tries ever shorter versions of the input, and the<br>interestingness test tells the reducer whether those shorter versions still<br>trigger the problem you care about. Test-case reducers can be astonishingly<br>effective – 95-99% reductions are common – and often make debugging<br>vastly easier.
Test-case reducers can sound like they’re magic: how can a tool know what parts<br>of the input to remove? To make things even worse, the community that has most<br>thoroughly embraced them are compiler authors, who many programmers think of as<br>being an impossibly skilled elite3. It seems to me that the combination of<br>these two things has put many people off from trying such tools.
The good news is that test-case reducers are not magic. The easiest<br>way to see that is by writing one. Let’s imagine that I’ve written this program<br>which reads words in from a file:
import sys<br>for l in open(sys.argv[1]):<br>if len(l) > 25: print("Word too long\n")
When I run this on my machine with /usr/share/dict/words it prints a warning:
$ python3 t.py /usr/share/dict/words<br>Word too long
For the sake of the example, let’s pretend that seeing that warning is<br>an “error” and we haven’t been able to work<br>out why it crashes using traditional debugging techniques. Can a test-case<br>reducer help us?
The first thing we need to do is define our interestingness test. I’ll<br>deliberately follow the conventions of the test-case reducers I’m familiar with<br>and say that an interestingness test is a program which:
Returns 0 if the input is “interesting” – that is, the input manifests<br>the error we are interested in – and we should use this reduced input going<br>forward.
Returns non-0 if the input is “uninteresting” and we need to try a<br>different reduction.
I’m going to write a simple4 shell script that...