Darts, dice, and coins: sampling from a discrete distribution

Darts, Dice, and Coins

Darts, Dice, and Coins: Sampling from a Discrete Distribution

Last Major Update: December 29, 2011

Earlier this year, I asked a question on Stack Overflow about a data structure for loaded dice. Specifically, I was interested in answering this question:

"You are given an n-sided die where side i has probability pi of being rolled. What is the most efficient data structure for simulating rolls of the die?"

This data structure could be used for many purposes. For starters, you could use it to simulate rolls of a fair, six-sided die by assigning probability $\frac{1}{6}$ to each of the sides of the die, or a to simulate a fair coin by simulating a two-sided die where each side has probability $\frac{1}{2}$ of coming up. You could also use this data structure to directly simulate the total of two fair six-sided dice being thrown by having an 11-sided die (whose faces were 2, 3, 4, ..., 12), where each side was appropriately weighted with the probability that this total would show if you used two fair dice. However, you could also use this data structure to simulate loaded dice. For example, if you were playing craps with dice that you knew weren't perfectly fair, you might use the data structure to simulate many rolls of the dice to see what the optimal strategy would be. You could also consider simulating an imperfect roulette wheel in the same way.

Outside the domain of game-playing, you could also use this data structure in robotics simulations where sensors have known failure rates. For example, if a range sensor has a 95% chance of giving the right value back, a 4% chance of giving back a value that's too small, and a 1% chance of handing back a value that's too large, you could use this data structure to simulate readings from the sensor by generating a random outcome and simulating the sensor reading in that case.

The answer I received on Stack Overflow impressed me for two reasons. First, the solution pointed me at a powerful technique called the alias method that, under certain reasonable assumptions about the machine model, is capable of simulating rolls of the die in $O(1)$ time after a simple preprocessing step. Second, and perhaps more surprisingly, this algorithm has been known for decades, but I had not once encountered it! Considering how much processing time is dedicated to simulation, I would have expected this technique to be better- known. A few quick Google searches turned up a wealth of information on the technique, but I couldn't find a single site that compiled together the intuition and explanation behind the technique.

This writeup is my attempt to give a quick survey of various approaches for simulating a loaded die, ranging from simple techniques that are highly impractical to the very optimized and efficient alias method. My hope here is to capture different intuitions about the problem and how each highlights some new aspect of simulating loaded dice. For each approach, my goal is to explore the motivating idea, core algorithm, correctness proof, and runtime analysis (in terms of time, memory, and randomness required).

Preliminaries

Before I go into any of the specific details of the different techniques, let's first standardize our notation and terminology.

In the introduction to this writeup, I used the term "loaded die" to describe a general scenario where there are is a finite set of outcomes, each of which has some associated probability. Formally, this is termed a discrete probability distribution, and the problem of simulating the loaded die is called sampling from a discrete distribution . To describe our discrete probability distribution (loaded die), we will assume that we are given a set of n probabilities $p_0, p_1, ..., p_{n - 1}$ associated with outcomes $0, 1, ..., n - 1$. Although the outcomes can be anything (heads/tails, numbers on a die, colors, etc.), for simplicity I'll assume that the outcome is some positive natural number that corresponds to the given index.

Dealing with real numbers on a computer is a bit of computation gray area. There are many fast algorithms whose speed is derived purely from the ability to, in constant time, compute the floor function of an arbitrary real number, and numerical inaccuracies in floating-point representations can entirely ruin certain algorithms. Consequently, before we embark on any discussion of algorithms for working with probabilities, which enter the dark world of real numbers, I should clarify what I assume the computer can and cannot do.

In what follows, I will assume that all of the following operations can be done in constant time:

Addition, subtraction, multiplication, division, and comparison of arbitrary real numbers . We will need to be able to do this in order to manipulate probabilities. This may seem like a very strong assumption, but if we assume that the precision of any real number is bounded by some polynomial of the machine word size (for example, a 64-bit double...

Darts, dice, and coins: sampling from a discrete distribution

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

It's Not Just X. It's Y