Being Wrong in the Same Direction

Being Wrong in the Same Direction | bytecode.news

ByteCode.News Search posts🔍 RSS

aimethodologytesting

Writing a failing test is meant to be the easy part of red/green. You write the test, you watch it fail, you make it pass, and the test certifies the fix. That story is complete, where the test is right and the code is wrong. But it's incomplete in the case where the test and the code are wrong in the same direction, and red/green is most useful precisely there - because nothing else will catch that problem.

Someone has to care.

What follows is a working example of red/green doing the thing that gets ignored: forcing you to commit, in executable form, to a specification precise enough that you can tell when the specification itself is the bug. The function in question is a sigmoid. Or rather, it says it's a sigmoid. We'll get to that.

Process

To do red/green testing, one creates a "red test" - called "red" because the error messages tend to be "red," red is the cultural icon for "stop," and so forth. The "red test" represents a failure. Let's walk through an example issue and then we'll look at the red/green fix for it.

Suppose the issue is with a sigmoid function (a function whose greatest variances are in the middle of the value set, very common in AI applications.)

The issue in question might look like this:

Given an input of 3.0, When sigmoid() is called, Then the return value is 1.25.

The specification is that the output range is 0.0 to 1.0. Thus:

Given an input of 3.0, When sigmoid() is called, Then the caller should receive an IllegalArgumentException.

The specific exception is justified because a sigmoid producing an out-of-range output signifies an invalid rule. This does not require checked exception handling (there should be no recovery for a broken system) and prevents corrupted outputs from being consumed downstream.

Seems simple enough! ... it's also incorrect.

So the approach here is first to validate the issue is right, where "right" means that it actually summarizes the situation properly:

@Test void callSigmoidWithBadValue() { assertThrows(IllegalArgumentException.class, () -> AIMath.sigmoid(3.0));

Then we run the test. We don't really care about the output - the issue says that the method should be throwing an exception, and ... oh, wait, you don't know what the method looks like! Let's see what it actually is:

/** * There are dragons here, readers. A maze of twisty passages, all alike, * and a grue is going to get us all because we're not paying attention. */ double sigmoid(double x) { return 0.5 + 0.25 * x;

It doesn't throw an exception. Ever. So we know this test will fail. This is good; we have now validated that the issue exists, in a very simple approach. It's not a great approach, but it's a start. What we'd actually like to do is throw a series of boundary conditions at the test. Let's write a parameterized test for JUnit, one that provides an input, whether we expect an exception, and if not, the expected output within a certain granularity.

Why granularity? Because of IEEE math. It's imprecise, and the exact precision isn't necessary for this scope; if we need it, BigDecimal is there, but that's out of scope here.

private static final double TOLERANCE = 1e-9;

@ParameterizedTest(name = "sigmoid({0}): expectException={1}, expected={2}") @CsvSource({ "0.0, false, 0.5", "1.0, false, 0.75", "-1.0, false, 0.25", "2.0, false, 1.0", "-2.0, false, 0.0", "3.0, true, 0.0", "-3.0, true, 0.0", "10.0, true, 0.0", "-10.0, true, 0.0", "NaN, true, 0.0" }) void callSigmoid(double input, boolean expectException, double expectedOutput) { if (expectException) { assertThrows(IllegalArgumentException.class, () -> AIMath.sigmoid(input)); return; double actual = AIMath.sigmoid(input); assertEquals(expectedOutput, actual, TOLERANCE, "sigmoid(" + input + ") returned the wrong value");

Here, we have ten inputs to throw at our little method. If we run this, we see our first five inputs work - no exceptions - but our last five.. don't. We have confirmed that the issue is correct. It's not useful, per se, but it's correct. We can make it more useful by adding some error logging; let's add SLF4J and track the output from sigmoid().

We're going to do this very clumsily; we're using info() when we should be using debug() or, really, trace(). I'm mostly trying to avoid having to add a logging configuration here.

var sigmoid = 0.5 + 0.25 * x; logger.info("sigmoid({}) = {}", x, sigmoid); return sigmoid;

Our tests still fail, BUT we see the test run like this now:

08:55:51.379 [Test worker] INFO news.bytecode.AIMath -- sigmoid(3.0) = 1.25

Expected java.lang.IllegalArgumentException to be thrown, but nothing was thrown. org.opentest4j.AssertionFailedError: Expected java.lang.IllegalArgumentException to be thrown, but nothing was thrown.

When we fix the logging levels to trace() - should we decide to do that - that logger call turns into almost a no-op; it's not quite a no-op, but it's...

Being Wrong in the Same Direction

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast