Temporal Shrinking
Temporal Shrinking.
Posted on 2025-11-22
::
9 min read
:: Tags:
🏷software engineering,
🏷testing
Random testing, in isolation, is a stupid idea. You take a program, which has a practically infinite<br>amount of possible inputs, you randomly pick one of those infinite possibilities in the hopes of<br>breaking the program, or some assertion about the program, and it actually works (discovers bug) in even the most<br>naive form possible, which is essentially the following loop:
while True:<br>input = generate_random_input()<br>if not property(input):<br>report_bug(input)<br>Over the years, people have developed many different improvements on this naive<br>form I described. The most famous example is coverage guidance, which led to the rise of fuzz testing in the<br>last decade, finding thousands, perhaps millions of different bugs across systems. In coverage guiding,<br>the random inputs are biased towards inputs that increase the code coverage of the system under test, leading to<br>better exploration of the input space. The modified loop looks like this:
pool = initialize_pool()<br>while True:<br>input = mutate(select_from_pool(pool))
if not property(input):<br>report_bug(input)
if increases_coverage(input, pool):<br>pool.add(input)<br>As we found more and more bugs, it became more and more apparent that finding bugs was, and is, not sufficient. Someone<br>must validate and fix the said bugs. Hence, the "counterexamples", inputs that trigger the alleged bugs, must be legible,<br>inspectable, managable. Ideally, the tester should receive the simplest possible form of the input, isolated from any noise,<br>only providing the minimal necessary structure for the reproduction, and the eventual fixing, of the bug. The solution emerged<br>in different names, but in similar shapes, minimization, delta-debugging, or shrinking.
Shrinking is one of these ideas that immediately make sense once you hear about it. Let's say that we generated some random<br>input for some program, perhaps a random PNG image for an image processor. Once a bug is triggered in some part of the system,<br>we would like to get the simplest PNG possible that retriggers the input. For this, we can designate some simplification procedures.<br>We can dim each pixel independently, making the picture darker, turning as much of the pixels as possible into #000000. We can<br>also remove some parts of the image completely by cropping the image. As it turns out, many times these simplifications are equivalent<br>to making the values smaller, such as reducing the size of the image, or converging pixels to white as the smallest hex coded<br>color, hence the name shrinking. In shrinking, we start from a bug-triggering input case, and we repeatedly produce smaller (shrunk)<br>versions of the input as long as there is a smaller input that still retriggers the bug. If we cannot find a smaller instance,<br>we report the original input to the user.
Given some input I that triggers a bug, we can define a shrinking function shrink(I) -> [I_1, I_2...I_n], we can write<br>a version of the shrinking algorithm as follows:
def shrink_input(input):<br>for smaller_input in shrink(input):<br>if triggers_bug(smaller_input):<br>return shrink_input(smaller_input)<br>return input<br>Enters temporality. The default perspective into random testing is that it produces many inputs from scratch, you may conjure up PNGs<br>from thin air, produce balanced binary trees, generate network topologies, create virtual physical environments or write texts with<br>grammatical faults, all depending on your domain. However, there are many systems where generating inputs from scratch is not an option,<br>you cannot first generate an entire well-formed SQLite database from scratch, and then a SQL query over the database, only to test the<br>result of a simple query. Therefore, instead of trying to generate such complex structures from scratch, random testing have mostly<br>converged to using stateful generation. In stateful generation, we add a notion of time to random testing. We start from an initial<br>state S_0 at time t=0, and generate an interaction against a system based on its API. For a SQL database, these can be SQL statements,<br>for a key-value store, these are insert, delete, get...
After every interaction, the state of the system possibly changes, affecting the next input we generate. Assuming we're lucky (the<br>system itself is not affected by the physical time), every action will be another tick, going from t=0,1,2...n until we discover<br>a bug. We already know the logical next step, shrinking, but how do we shrink these actions that have these potential linear<br>dependencies with each other, every action possibly depending on all prior actions to be valid? Below is an example of such a dependency<br>chain:
CREATE TABLE t0 (c0, c1);<br>INSERT INTO t0 VALUES (0, 0), (1, 1), (2, 2);<br>SELECT FROM t0 WHERE c0 >= 1; -- @bind r<br>-- @assert r = (1, 1), (2, 2)<br>The language depicted here is a small augmentation of SQL for binding results of intermediate queries and asserting facts...