Why Your Test Suite Lies to You at Scale | BDR Methodology<br>Skip to content
Why Your Test Suite Lies to You at Scale
May 16, 2026<br>Dmitry<br>QA Automation Engineer
Why Your Test Suite Lies to You at Scale PRO IMPLEMENTATION<br>Section titled “Why Your Test Suite Lies to You at Scale ”
New to Playwright reliability? Start with the fundamentals: Flaky Tests You Can’t Fix With Better Selectors — the same concepts with more explanation and simpler examples.
Green tests and broken production is a specific failure mode that gets more common as test suites grow. The locators are right, the assertions are correct, the mocks return the expected data — and none of it reflects what the system actually does under load, with real network conditions, against a real database.
This article covers three architectural problems that cause this: API non-idempotency, mock drift, and data accumulation. Each is invisible at small scale. Each becomes expensive at large scale.
Code examples are intentionally simplified — focus on the architectural pattern.
The Failure Mode Nobody Talks About<br>Section titled “The Failure Mode Nobody Talks About”
Most flakiness guides focus on selectors and timing. That’s the visible layer. The invisible layer is data and integration:
A POST request succeeds on the server, the response is lost in transit, Playwright retries, the server creates a second record. Your test now has two orders instead of one, and the assertion that checks order count fails — not because the feature is broken, but because the network hiccuped.
Your mock returns { order_id: "123" }. The backend deployed last Tuesday and now returns { orderId: "123" }. Tests are green. The field your frontend reads is undefined. Production is broken.
Tests create 100 users per minute. Nobody cleans up reliably. Two weeks later, unique constraint violations start appearing in unrelated tests. The database that was supposed to be isolated is shared state in disguise.
These aren’t test bugs. They’re architectural gaps. And they require architectural solutions.
Idempotency: Making POST Requests Safe to Retry<br>Section titled “Idempotency: Making POST Requests Safe to Retry”
The standard mental model of HTTP: a request either succeeds or fails. The reality: a request can succeed on the server and fail to deliver the response. The client sees a timeout and retries. The server sees a new request.
For GET requests this is harmless. For POST requests that create or modify state, it creates duplicates.
The solution: idempotency keys
An idempotency key is a client-generated identifier that the server uses to detect duplicate requests. If the server has processed a request with this key before, it returns the cached result instead of processing again.
The key design question is how to generate the key. A static key per test fails when a test makes multiple POST requests — the server treats the second request as a duplicate of the first. A random UUID per request defeats the purpose — retries get new keys and bypass the deduplication.
The correct approach: derive the key deterministically from the request context.
api/infrastructure/idempotency.tsimport { createHash } from 'crypto';
export function generateIdempotencyKey(method: string, url: string, data: unknown): string {
const payload = `${method}:${url}:${JSON.stringify(data)}`;
return createHash('sha256').update(payload).digest('hex').slice(0, 16);
api/clients/BaseApiClient.tsexport abstract class BaseApiClient {
protected async post(url: string, data?: unknown) {
const key = generateIdempotencyKey('POST', url, data);
return await this.request.post(url, {
data,
headers: { 'X-Idempotency-Key': key },
});
Two calls to createUser with identical data get identical keys — the server deduplicates. Two calls with different data (create user, then create order) get different keys — both process correctly.
Important nuance: if your test legitimately needs two identical records (same method, URL, and body), they’ll get the same key — and the server will return the cached result for the second call. This is correct behaviour for retries, but it means this approach assumes each unique operation has unique data. If you genuinely need two identical resources, add a distinguishing field (like a requestId or timestamp) to the body.
The backend requirement: this only works if the server implements idempotency key handling. Most payment APIs (Stripe, PayPal) support this natively. If your payment provider doesn’t — that’s their problem to solve, not yours. Use WireMock to mock them, or find their sandbox/test mode. If it’s your own internal backend that’s missing support — that’s a tech-debt conversation with your backend team. The pattern is well-documented and the database cost is minimal: store key + response hash, expire after 24 hours.
The network failure scenario:
Client → POST /orders (key: abc123) → Server processes, creates order
Server → Response lost in transit
Client → Timeout,...