Darts, Dice, and Coins
Darts, Dice, and Coins: Sampling from a Discrete Distribution
Last Major Update: December 29, 2011
Earlier this year, I asked a question on Stack Overflow about<br>a data structure for loaded dice. Specifically, I was<br>interested in answering this question:
"You are given an n-sided die where side i has probability pi of being rolled. What is the most efficient data<br>structure for simulating rolls of the die?"
This data structure could be used for many purposes. For starters, you could use it to simulate rolls of a fair, six-sided die by assigning<br>probability $\frac{1}{6}$ to each of the sides of the die, or a to simulate a fair coin by simulating a two-sided die where each side has<br>probability $\frac{1}{2}$ of coming up. You could also use this data structure to directly simulate the total of two fair six-sided dice being thrown<br>by having an 11-sided die (whose faces were 2, 3, 4, ..., 12), where each side was appropriately weighted with the probability that this total<br>would show if you used two fair dice. However, you could also use this data structure to simulate loaded dice. For example, if you were<br>playing craps with dice that you knew weren't perfectly fair, you might use the data<br>structure to simulate many rolls of the dice to see what the optimal strategy would be. You could also consider simulating an imperfect<br>roulette wheel in the same way.
Outside the domain of game-playing, you could also use this data structure in robotics simulations where sensors have known failure rates.<br>For example, if a range sensor has a 95% chance of giving the right value back, a 4% chance of giving back a value that's too small, and a<br>1% chance of handing back a value that's too large, you could use this data structure to simulate readings from the sensor by generating a<br>random outcome and simulating the sensor reading in that case.
The answer I received on Stack Overflow impressed me for two reasons. First, the solution pointed me at a powerful technique called the<br>alias method that, under certain reasonable assumptions about the machine model, is capable of simulating rolls of the die in $O(1)$<br>time after a simple preprocessing step. Second, and perhaps more surprisingly, this algorithm has been known for decades, but I had not<br>once encountered it! Considering how much processing time is dedicated to simulation, I would have expected this technique to be better-<br>known. A few quick Google searches turned up a wealth of information on the technique, but I couldn't find a single site that compiled<br>together the intuition and explanation behind the technique.
This writeup is my attempt to give a quick survey of various approaches for simulating a loaded die, ranging from simple techniques that are<br>highly impractical to the very optimized and efficient alias method. My hope here is to capture different intuitions about the problem<br>and how each highlights some new aspect of simulating loaded dice. For each approach, my goal is to explore the motivating idea, core<br>algorithm, correctness proof, and runtime analysis (in terms of time, memory, and randomness required).
Preliminaries
Before I go into any of the specific details of the different techniques, let's first standardize our notation and terminology.
In the introduction to this writeup, I used the term "loaded die" to describe a general scenario where there are is a finite set of outcomes,<br>each of which has some associated probability. Formally, this is termed a<br>discrete probability distribution, and the<br>problem of simulating the loaded die is called sampling from a discrete distribution .<br>To describe our discrete probability distribution (loaded die), we will assume that we are given a set of n probabilities $p_0, p_1, ...,<br>p_{n - 1}$ associated with outcomes $0, 1, ..., n - 1$. Although the outcomes can be anything (heads/tails, numbers on a<br>die, colors, etc.), for simplicity I'll assume that the outcome is some positive natural number that corresponds to the given index.
Dealing with real numbers on a computer is a bit of computation gray area. There are many fast algorithms whose speed is derived purely from<br>the ability to, in constant time, compute the floor function of an arbitrary real<br>number, and numerical inaccuracies in floating-point representations can entirely ruin certain algorithms. Consequently, before we embark<br>on any discussion of algorithms for working with probabilities, which enter the dark world of real numbers, I should clarify what I assume the<br>computer can and cannot do.
In what follows, I will assume that all of the following operations can be done in constant time:
Addition, subtraction, multiplication, division, and comparison of arbitrary real numbers . We will need to be able to do this in order to<br>manipulate probabilities. This may seem like a very strong assumption, but if we assume that the precision of any real number is<br>bounded by some polynomial of the machine word size (for example, a 64-bit double...