Test-case reducers are underappreciated debugging tools

Laurence Tratt: Test-case Reducers Are Underappreciated Debugging Tools

Home

> Blog

Home> Blog

email Bluesky Mastodon Twitter

Bluesky Twitter

Test-case Reducers Are Underappreciated Debugging Tools

RSS feed: whole site or just the blog

Recent posts

Test-case Reducers Are Underappreciated Debugging Tools

Retrofitting JIT Compilers into C Interpreters

PLISS 2026

Upcoming Talk in London

Async and Finaliser Deadlocks

What Context Can Bring to Terminal Mouse Clicks

Why Firsts Matter

LLM Inflation

Comparing the Glove80 and Maltron keyboards

The LLM-for-software Yo-yo

Blog archive

Test-case reducers are less well known than they should be, and those who are aware of them don’t always realise the variety of ways we can use – perhaps even abuse! – them. In this post, I’m going to explore some of the things I’ve learnt while using these wonderful tools. I’ll start at the basics, because the idea is so simple that it can be hard to believe it works.

I’ll then work my way up to a deeper surprise. Test-case reducers try to reduce the length of an input, but we can force them to take into account additional factors such as how often an error occurs, or the number of instructions executed. Depending on the problem you’re trying to debug, this can make a huge difference to the real-world effectiveness of test-case reducers.

I don’t expect that anything in this post will surprise experts, but since I learnt this stuff the hard way, perhaps others will benefit from having some of it in one place.

Test-case reduction

Imagine we have written a program that crashes on a large input, and we don’t know what part of the input causes the crash: what can we do?

Most of us will probably start with debugging our program using classic techniques, gradually moving from the quick and dirty (printf) to the more principled (debuggers) to – if we’re desperate and experienced – “exotic” tools (sanitisers, valgrind, etc). Every programmer ends up with a set of debugging tools and techniques they reach for, some more effective1 than others.

One technique that is used less often than it probably should be is reducing the size of the input. In nearly all cases, the smaller an input is, the easier it is for us to work out why it’s leading to problems.

Reduction can be manual. We can load an input into a text editor, remove a portion of it, and then see if the new, smaller, input still causes a crash.

Unsurprisingly, as easily-bored humans with limited vision, we tend to miss many opportunities for reduction when we do so manually. It’s also often the case that deleting part of an input stops the program crashing in the way we were investigating: the reduced program might run to completion, or throw a different, correct and expected, error on the new input. Even worse, at some point one realises that deleting portion A of the input doesn’t achieve the effect we want, but deleting disjoint portions A and B does: how big is the search space of deletions?! A Sisyphean future beckons.

Test-case reducers

Fortunately, there are tools that automate the process of reducing test cases: test-case reducers2. These take a program, an input, and an interestingness test. The test-case reducer tries ever shorter versions of the input, and the interestingness test tells the reducer whether those shorter versions still trigger the problem you care about. Test-case reducers can be astonishingly effective – 95-99% reductions are common – and often make debugging vastly easier.

Test-case reducers can sound like they’re magic: how can a tool know what parts of the input to remove? To make things even worse, the community that has most thoroughly embraced them are compiler authors, who many programmers think of as being an impossibly skilled elite3. It seems to me that the combination of these two things has put many people off from trying such tools.

The good news is that test-case reducers are not magic. The easiest way to see that is by writing one. Let’s imagine that I’ve written this program which reads words in from a file:

import sys for l in open(sys.argv[1]): if len(l) > 25: print("Word too long\n")

When I run this on my machine with /usr/share/dict/words it prints a warning:

$ python3 t.py /usr/share/dict/words Word too long

For the sake of the example, let’s pretend that seeing that warning is an “error” and we haven’t been able to work out why it crashes using traditional debugging techniques. Can a test-case reducer help us?

The first thing we need to do is define our interestingness test. I’ll deliberately follow the conventions of the test-case reducers I’m familiar with and say that an interestingness test is a program which:

Returns 0 if the input is “interesting” – that is, the input manifests the error we are interested in – and we should use this reduced input going forward.

Returns non-0 if the input is “uninteresting” and we need to try a different reduction.

I’m going to write a simple4 shell script that...

Test-case reducers are underappreciated debugging tools

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy