Deterministic Simulation Testing for Distributed Systems: Good, Bad, and Ugly

aoli-al1 pts0 comments

Deterministic Simulation Testing for Distributed Systems: The Good, the Bad, and the Ugly | Ao Li

Deterministic Simulation Testing for Distributed Systems: The Good, the Bad, and the Ugly

During my PhD, I built Fray, a deterministic simulation testing framework for concurrent programs written in Java. Fray was quite successful and has found many bugs in mature concurrent programs. As an academic, a natural question for me was: can we use a similar idea to test distributed systems? This led to the last chapter of my PhD thesis: Diorama, a deterministic simulation testing framework for distributed systems.

The Core Idea: Reduce a Distributed System to a Concurrent Program

The core idea of Diorama is simple: if we can reduce a distributed system to a concurrent program, we can use Fray’s deterministic simulation testing framework to test it. The power of this idea is that it reduces all “events” in a distributed system to “scheduling decisions,” allowing a centralized scheduler to control everything — when a network message is delivered, how fast a clock progresses, and in what order nodes observe each other’s actions.

Concretely, Diorama replaces the network layer of a distributed system with an in-memory implementation, replaces the physical clock with a logical clock, and isolates each node’s state through standalone class loaders.

The Good: Simplicity

Diorama is designed to be simple and easy to use. It does not require any intrusive code changes to the distributed system and provides a clean interface to launch nodes, inspect states, and verify behavior.

To test a distributed system with Diorama, you implement a ServerInstance interface for each node (both servers and clients):

1public interface ServerInstance {<br>2 void run(String[] args) throws Exception;<br>3 void stop();<br>4}

Then you write a standard JUnit test that launches your nodes through Diorama, waits for the workload to finish, and verifies results. You can run this test as a normal JUnit test with @Test, or as a controlled concurrency test with @FrayTest — the latter lets Fray&rsquo;s scheduler systematically explore different message orderings and timing behaviors.

For example, here is a simplified ServerInstance for Zookeeper:

1public class ZookeeperServer implements ServerInstance {<br>2 private final QuorumPeerMain quorumPeer = new QuorumPeerMain();<br>4 @Override<br>5 public void run(String[] args) {<br>6 QuorumPeerConfig config = new QuorumPeerConfig();<br>7 config.parse(args[0]);<br>8 quorumPeer.runFromConfig(config);<br>9 }<br>10<br>11 @Override<br>12 public void stop() {<br>13 quorumPeer.close();<br>14 }<br>15}

And a test that launches a 3-node Zookeeper cluster, kills the leader, restarts it, and verifies that the cluster re-converges with a single leader:

1@ExtendWith(FrayTestExtension.class)<br>2public class ZookeeperIntegrationTest {<br>3 @FrayTest(...)<br>4 void testLeaderRestart() throws Exception {<br>5 // Launch 3 Zookeeper servers<br>6 for (int i = 1; i 3; i++) {<br>7 launchServer("ZookeeperServer", new String[]{configPath(i)}, ...);<br>8 }<br>9 Thread.sleep(3000L); // lets the cluster elect a leader<br>10<br>11 // Kill the leader and restart it<br>12 int leaderId = findLeader();<br>13 stopServer(leaderId);<br>14 launchServer(leaderId);<br>15<br>16 // Verify: cluster must converge to 1 leader + 2 followers<br>17 waitAndVerifyModes(3, m -><br>18 countMode(m, "leader") == 1 && countMode(m, "follower") == 2<br>19 );<br>20 }<br>21}

Here, stopServer calls the stop method defined in ZookeeperServer, and waitAndVerifyModes spawns a Zookeeper client that sends the stat command to each server to count the number of leaders and followers (the same way you would interact with a real Zookeeper cluster).

Because the test logic is just straightforward Java with no framework-specific DSL, AI agents can generate test scenarios with simple prompting. In the diorama-examples repo, we provide three distributed system examples that demonstrate how Diorama can be used to test distributed systems. The JRaft and ScaleCube examples were generated entirely by AI agents.

The Bad: The Applicability Dilemma

In the Fray paper, I argued that mocking the concurrency library is not a good approach because it hurts applicability. However, in Diorama, we explicitly replace the network layer of a distributed system with an in-memory implementation, and this hurts applicability in the same way.

In our implementation, we replaced java.net.Socket and java.net.ServerSocket with in-memory implementations. This only works when: 1) the application uses physical network sockets directly; 2) the application does not use a network library such as Netty; 3) the application does not try to subclass these classes.

The Ugly

Luckily, we can still find many applications that use these classes directly, and we can still test them with Diorama. But once you write test scenarios and launch them with Diorama, you are going to face the ugly part of distributed system testing.

Timed Operations

In the paper A Note on...

distributed test diorama system testing deterministic

Related Articles