You must be this tall to ride | Jacques Corby-Tuech
Contents
What decisioning is, and why the maths runs backwards
The maths doesn't care which statistics you use
"But ours is Bayesian"
The strongest version of the pitch
The bandit is built to not find out
What "this tall" actually means
There's a measuring post at the entrance to the Cannonball Express, the old rollercoaster at Pleasurewood Hills outside Lowestoft. Clear the line painted on it and you ride; fall short and you don't. Nobody argues with the post. The physics of the ride set the number, the number isn't negotiable, and being turned away isn't an insult. It's just maths about restraint systems.
Image
“Pleasurewood Hills” by Jeremy Thompson, CC BY 2.0
Every lifecycle decisioning product has a post like that too. A minimum size below which the thing cannot do what it says on the box, set not by the vendor's generosity but by the statistics of telling signal from noise. The decisioning vendors aren't especially secretive about how they work, which is the surprising part. The serious ones lean on holdout measurement against a control, and one or two publish randomised trials with confidence intervals, more candour than the rest of the category manages. What none of them posts at the entrance is the number that actually decides whether the thing works for you: the audience you'd need before that holdout could pick the effect they're selling out of the noise. The absence isn't damning on its own. Vendors publish what buyers ask about, and a buyer who hasn't worked out they might be too small for it never asks, so the sign stays down by mutual neglect, the one measurement that would tell some of them not to bother raised by neither side. Gartner's 2025 Marketing Technology Survey found marketers use under half the martech they own, and only 15% of organisations can show a given tool meets its goals and pays for itself. The same buyer rarely asks what audience the tool needs to work.
What decisioning is, and why the maths runs backwards
Most of the "AI" in a lifecycle stack is decision support: send-time optimisation, churn and propensity scores, generative subject lines, the features that rank and suggest at points inside a journey the marketer still builds. Useful or not, they leave the human holding the decision. Decisioning is the narrower, newer thing that takes the decision away: a system that chooses, per person, what to send, when, on which channel, which offer, and whether to send at all, learning from holdouts inside the guardrails you set. That is the category that borrows the platforms' machinery, reinforcement learning and contextual bandits, and the one where scale decides whether it works. Aampe, just acquired by MoEngage, runs a bandit-driven agent per user; Hightouch runs reinforcement learning on top of your warehouse; OfferFit, now inside Braze, pitches learning agents as the replacement for your A/B tests; Adobe, Salesforce and Pega run native decisioning above. What every one of them quietly needs, and none advertises a floor for, is the scale to tell its own choices apart from noise.
None of it frees you from that. Nothing does, because the constraint doesn't live in the algorithm. It lives in the data.
The amount of evidence you need to tell two options apart is governed by how different they are. Specifically, by the inverse square of the gap between them. Halve the effect you're trying to detect and you don't need twice the traffic, you need four times. That relationship sits in the likelihood, which is the part neither a frequentist nor a Bayesian nor a bandit gets to edit. So when a vendor says their method needs less data for small-traffic, small-effect situations, they are describing the one regime where the maths is most punishing, and promising it's the easiest. Smaller is harder.
The maths doesn't care which statistics you use
The plainest case: one control, one variant, a conversion rate you care about, and a wish to know whether the variant is genuinely better. The sample size to settle that, at the usual 95% confidence and 80% power, comes out of a formula Ron Kohavi, who ran thousands of these at Microsoft and Amazon, likes to write as n = 16σ²/d² per arm: variance on top, the square of the effect you want to catch on the bottom.1 Plug in real numbers and the bottom of that fraction eats you alive.
Baseline conversion<br>Relative lift you want to catch<br>Users needed (both arms)
5%<br>20%<br>~16,000
5%<br>10%<br>~62,000
5%<br>5%<br>~244,000
5%<br>1%<br>~6,000,000
2%<br>10%<br>~161,000
20%<br>5%<br>~51,000
Those are per test, for the simplest test there is. Read the 5% baseline column downward and watch what happens as the effect you're chasing shrinks: a 20% lift is cheap, a 5% lift costs a quarter of a million, a 1% lift costs six million. The requirement climbs through the roof as the effect shrinks. The arithmetic here, and the bandit simulation, both live in a companion notebook you can open in Colab and push your own numbers...