Temporal Shrinking

Temporal Shrinking.

Posted on 2025-11-22

9 min read

:: Tags:

🏷software engineering,

🏷testing

Random testing, in isolation, is a stupid idea. You take a program, which has a practically infinite amount of possible inputs, you randomly pick one of those infinite possibilities in the hopes of breaking the program, or some assertion about the program, and it actually works (discovers bug) in even the most naive form possible, which is essentially the following loop:

while True: input = generate_random_input() if not property(input): report_bug(input) Over the years, people have developed many different improvements on this naive form I described. The most famous example is coverage guidance, which led to the rise of fuzz testing in the last decade, finding thousands, perhaps millions of different bugs across systems. In coverage guiding, the random inputs are biased towards inputs that increase the code coverage of the system under test, leading to better exploration of the input space. The modified loop looks like this:

pool = initialize_pool() while True: input = mutate(select_from_pool(pool))

if not property(input): report_bug(input)

if increases_coverage(input, pool): pool.add(input) As we found more and more bugs, it became more and more apparent that finding bugs was, and is, not sufficient. Someone must validate and fix the said bugs. Hence, the "counterexamples", inputs that trigger the alleged bugs, must be legible, inspectable, managable. Ideally, the tester should receive the simplest possible form of the input, isolated from any noise, only providing the minimal necessary structure for the reproduction, and the eventual fixing, of the bug. The solution emerged in different names, but in similar shapes, minimization, delta-debugging, or shrinking.

Shrinking is one of these ideas that immediately make sense once you hear about it. Let's say that we generated some random input for some program, perhaps a random PNG image for an image processor. Once a bug is triggered in some part of the system, we would like to get the simplest PNG possible that retriggers the input. For this, we can designate some simplification procedures. We can dim each pixel independently, making the picture darker, turning as much of the pixels as possible into #000000. We can also remove some parts of the image completely by cropping the image. As it turns out, many times these simplifications are equivalent to making the values smaller, such as reducing the size of the image, or converging pixels to white as the smallest hex coded color, hence the name shrinking. In shrinking, we start from a bug-triggering input case, and we repeatedly produce smaller (shrunk) versions of the input as long as there is a smaller input that still retriggers the bug. If we cannot find a smaller instance, we report the original input to the user.

Given some input I that triggers a bug, we can define a shrinking function shrink(I) -> [I_1, I_2...I_n], we can write a version of the shrinking algorithm as follows:

def shrink_input(input): for smaller_input in shrink(input): if triggers_bug(smaller_input): return shrink_input(smaller_input) return input Enters temporality. The default perspective into random testing is that it produces many inputs from scratch, you may conjure up PNGs from thin air, produce balanced binary trees, generate network topologies, create virtual physical environments or write texts with grammatical faults, all depending on your domain. However, there are many systems where generating inputs from scratch is not an option, you cannot first generate an entire well-formed SQLite database from scratch, and then a SQL query over the database, only to test the result of a simple query. Therefore, instead of trying to generate such complex structures from scratch, random testing have mostly converged to using stateful generation. In stateful generation, we add a notion of time to random testing. We start from an initial state S_0 at time t=0, and generate an interaction against a system based on its API. For a SQL database, these can be SQL statements, for a key-value store, these are insert, delete, get...

After every interaction, the state of the system possibly changes, affecting the next input we generate. Assuming we're lucky (the system itself is not affected by the physical time), every action will be another tick, going from t=0,1,2...n until we discover a bug. We already know the logical next step, shrinking, but how do we shrink these actions that have these potential linear dependencies with each other, every action possibly depending on all prior actions to be valid? Below is an example of such a dependency chain:

CREATE TABLE t0 (c0, c1); INSERT INTO t0 VALUES (0, 0), (1, 1), (2, 2); SELECT FROM t0 WHERE c0 >= 1; -- @bind r -- @assert r = (1, 1), (2, 2) The language depicted here is a small augmentation of SQL for binding results of intermediate queries and asserting facts...

Temporal Shrinking

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level