The Gambler's Fallacy and The Law of Small Numbers

Games and gambling have been part of human cultures around the world for millennia. Nowadays, the connection between games of chance and mathematics (in particular probability) are so obvious that it is taught to school children. However, the mathematics of games and gambling only started to formally develop in the 17th century with the works of multiple mathematicians such as Fermat and Pascal. It is then no wonder that many incorrect beliefs around gambling have formed that are "intuitive" from a layman's perspective but fail to pass muster when applying the rigor of mathematics.

In this post, I want to discuss how surprisingly easy it is to be fooled into the wrong line of thinking even when approaching it using mathematics. We'll take a look from both a theoretical (mathematics) point of view looking at topics such as the Gambler's Fallacy and the law of small numbers as well as do some simulations using code to gain some insight into the problem. This post was inspired by a paper I recently came across a paper by Miller and Sanjurjo[1] that explains the surprising result of how easily we can be fooled.

The Laws of Large (and Small) Numbers

Let's start by taking a look at one of the simplest situations we can think of: flipping a fair coin. By "fair", we usually mean that it has a 50% chance of landing heads (or "H" for short) and 50% change of landing tail (or "T"). More formally:

$$P(H) = 0.5$$$$P(T) = 0.5$$

What about flipping a fair coin N times? We expect to get roughly half of the coins to end up H and half T. This is confirmed by Borel's law of large numbers (one of the various forms) that states:

If an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event occurs approximately equals the probability of the event's occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be.

Let's see exactly how man repetitions we need to get close.

Simulation

Let's first define some code to do our fair coin flip and also simulations of the fair coin flip.

In [478]:
import random 

def flip():
    return random.choice(['H', 'T'])

def simulate(N):
    return [1 if flip() == 'H' else 0 for x in range(N)]

def flip_ratio(flips):
    N = len(flips)
    heads = sum(flips)
    tails = N - heads
    # Python 3: implicit float division, love it!
    return '(%.3f, %.3f)' % (heads / N, tails / N)

Next, let's simulate for several \(N\) to see how many repetitions is enough.

In [482]:
from pandas import Series, DataFrame

def toseries(r):
    return Series(r, index=['Run %d' % (i+1) for i in range(0, len(r))])

nvals = [10, 50, 100, 1000, 10000]
runs = 4
data = [toseries([flip_ratio(simulate(n)) for x in range(runs)]) for n in nvals]

# Put it in pandas data frame for pretty printing
DataFrame(data, index=['N=%d' % n for n in nvalues])
Out[482]:
Run 1 Run 2 Run 3 Run 4
N=10 (0.600, 0.400) (0.300, 0.700) (0.200, 0.800) (0.300, 0.700)
N=50 (0.420, 0.580) (0.480, 0.520) (0.420, 0.580) (0.340, 0.660)
N=100 (0.470, 0.530) (0.500, 0.500) (0.450, 0.550) (0.430, 0.570)
N=1000 (0.522, 0.478) (0.509, 0.491) (0.508, 0.492) (0.485, 0.515)
N=10000 (0.495, 0.505) (0.506, 0.494) (0.509, 0.491) (0.496, 0.504)

When \(N=10\), we can get some pretty wild swings like in Run 3 with only 20% of the flips resulting in heads. Even at \(N=50\), we're still not quite close to the 50/50 mark. Once we start getting past \(N=100\), we start getting quite close to a 50/50 distribution with tighter range of values around \(0.5\).

Long-Run vs. Short-Run

This leads to a very important point when trying to reason with the law of large numbers:

Random events only converge to their average in the long-run.

The corollary to this rule is:

In the short-run anything can happen.

If you've ever been in a casino, the last statement will ring true (for better or worse). Not recognizing this second rule is essentially the law of small numbers[2] (also known as hasty generalization/induction):

[The law of small numbers] is an informal fallacy of faulty generalization by reaching an inductive generalization based on insufficient evidence—essentially making a hasty conclusion without considering all of the variables. In statistics, it may involve basing broad conclusions regarding the statistics of a survey from a small sample group that fails to sufficiently represent an entire population.

For example on the \(N=10\) run (or even \(N=50\)), we might have erroneously concluded that our coin was not fair because of the big deviation between heads and tails.

Independent Events and Gambler's Fallacy

Now let's take a look at another concept about random events: independence. The definition is basically what you intuitively think it might be:

The occurrence of one [event] does not affect the probability of the other.

Going back to our fair coin flipping example, each toss of our coin is independent from the other. Therefore each coin toss, regardless of what has happened before, has a 50/50 chance of being heads or tails. Easy to think about abstractly but what if we got a sequence of coin flips like this:

H H H H T H H H H ?

What would you expect the next flip to be? It's tempting to think it has a slightly higher change of being T but because of independence, it still has a 50/50 chance of being heads or tails. This almost natural tendency to believe that T should come up next (and ignore the independence of the events) is called the Gambler's Fallacy:

The gambler's fallacy, also known as the Monte Carlo fallacy or the fallacy of the maturity of chances, is the mistaken belief that, if something happens more frequently than normal during some period, it will happen less frequently in the future, or that, if something happens less frequently than normal during some period, it will happen more frequently in the future (presumably as a means of balancing nature).

You might think that this fallacy is so obvious that no one would make this mistake but you would be wrong. You don't have to look any further than your local casino where each roulette wheel has an electronic display showing the last ten or so spins [3]. Many casino patrons will use this screen to religiously count how many red and black numbers have come up, along with a bunch of other various statistics in hopes that they might predict the next spin. Of course each spin in independent, so these statistics won't help at all but that doesn't stop the casino from letting people throw their money away.

An Unexpected Truth

Now that we have an understanding of the law of large numbers, independent events and the gambler's fallacy, let's try to simulate a situation where we might run into the gambler's fallacy.

Let's concoct a situation. Take our fair coin. Flip it \(4\) times in a row, write down each outcome and on a piece of paper (or equivalently save it in memory). Next, count the number of outcomes that immediately followed a heads, and the number of those outcomes that were heads. Essentially, we're trying to compute \(P(H|H)\). Now repeat that experiment \(N\) times. We would expect that the chance of getting a H is roughly \(0.5\), or \(P(H|H) = 0.5\). Let's see if our intuition matches the empirical results.

Simulation

First, we can reuse our simulate() function from before to flip the coin 4 times. Next, we'll add two more functions to help us count the outcomes that immediately follow heads (and whether or not it is a heads), as well as running the \(N\) experiments.

In [432]:
def count_heads_after_heads(flips):
    heads_after_heads = 0
    total = 0
    # Need at least one flip
    for i in range(1, len(flips)):
        if flips[i-1] == 1:
            total += 1
            if flips[i] == 1:
                heads_after_heads += 1
    
    return (heads_after_heads, total)

def simulate_fallacy(num_flips, N):
    p = []
    for x in range(N):
        r = count_heads_after_heads(simulate(num_flips))
        if r[1] > 0:
            p += [r[0] / r[1]]
    
    return p

Now let's run through various values of \(N\), again we'll do several runs at each N to get an idea of how the experiment varies.

In [454]:
def gen_fallacy_data(runs, num_flips, Ns):
    data = []
    for N in Ns:
        sdata = []
        for run in range(runs):
            p = simulate_fallacy(num_flips, N)
            avg = sum(p) / len(p)
            sdata += ['%.4f' % avg]
        data += [toseries(sdata)]
    return data
    
runs = 4
num_flips = 4
Ns = [10, 100, 1000, 10000, 100000]
DataFrame(gen_fallacy_data(runs, num_flips, Ns), index=['N=%d' % n for n in Ns])
Out[454]:
Run 1 Run 2 Run 3 Run 4
N=10 0.4444 0.4286 0.4286 0.6000
N=100 0.4302 0.4101 0.4413 0.4623
N=1000 0.4101 0.4118 0.4070 0.4028
N=10000 0.4101 0.4027 0.4076 0.4004
N=100000 0.4049 0.4027 0.4062 0.4037

Surprised by the results? There's definitely something fishy going on here. Even for very large \(N\), we're getting values which seem to converge somewhere close to \(0.40\). At \(N=100000\), we should definitely start getting into the territory of the law of large numbers.

Let's try to play around with the parameters. How about increasing num_flips?

In [443]:
runs = 4
num_flips = 10
Ns = [10, 100, 1000, 10000]
DataFrame(gen_fallacy_data(runs, num_flips, Ns), index=['N=%d' % n for n in Ns])
Out[443]:
Run 1 Run 2 Run 3 Run 4
N=10 0.3958 0.2917 0.4367 0.4148
N=100 0.4419 0.4179 0.4248 0.4520
N=1000 0.4351 0.4445 0.4375 0.4418
N=10000 0.4453 0.4461 0.4402 0.4445

Interesting, it seems to be converging to a different number now. Let's keep pumping it up and see what happens.

In [449]:
from IPython.display import display

runs = 4
Ns = [10, 100, 1000]
for nf in [100, 1000, 5000]:
    df = DataFrame(gen_fallacy_data(runs, nf, Ns), 
                   index=['N=%d' % n for n in Ns])
    print("Number of flips = %d:" % nf)
    display(df)
Number of flips = 100:
Run 1 Run 2 Run 3 Run 4
N=10 0.5095 0.5142 0.4685 0.5295
N=100 0.4948 0.5140 0.4939 0.4863
N=1000 0.4966 0.4949 0.4922 0.4926
Number of flips = 1000:
Run 1 Run 2 Run 3 Run 4
N=10 0.5031 0.5005 0.5001 0.5033
N=100 0.5009 0.4995 0.4971 0.4987
N=1000 0.4986 0.5014 0.5002 0.5000
Number of flips = 5000:
Run 1 Run 2 Run 3 Run 4
N=10 0.4958 0.4975 0.4967 0.5017
N=100 0.4987 0.4997 0.5003 0.4998
N=1000 0.4996 0.4995 0.4998 0.4999

Now we see that the runs are much closer to what we would expect. So obviously the number of flips plays a big part in the bias we were initially seeing, while the number of experiments less so.

A Hidden Bias

Let's take a look at our first simulation with num_flips=4 and the probability of flips for each realization of four flips. The following table enumerates all possible sequences and shows the count of the numerator and denominator of our count_heads_after_heads() function. We also add the last columns to show the ratio between the two, which we denote loosely as the empirical probability of heads after heads.

Number of Heads Sequence # of Flips immediately after H # of Flips immediately after H that is H \(P(H \vert H)\)
0 TTTT 0 - -
1 TTTH 0 - -
1 TTHT 1 0 0
1 THTT 1 0 0
1 HTTT 1 0 0
2 TTHH 1 1 1
2 THTH 1 0 0
2 THHT 2 1 \(\frac{1}{2}\)
2 HTTH 1 0 0
2 HTHT 2 0 0
2 HHTT 2 1 \(\frac{1}{2}\)
3 THHH 2 2 1
3 HTHH 2 1 \(\frac{1}{2}\)
3 HHTH 2 1 \(\frac{1}{2}\)
3 HHHT 3 2 \(\frac{2}{3}\)
4 HHHH 3 3 1
Expected Value \(\frac{17}{42} = 0.404\ldots\)

The last row shows the expected value which is just the simple average of the last column. You can see that this analysis matches what our simulation shows, a value close to \(0.40\). But where does the bias coming from?

First, we should note that for num_flips=4, we would (correctly) expect roughly \(2\) out of \(4\) flips to be heads. But what about a heads after heads? For a given number of heads (first column), for example \(2\), it is impossible to get \(2\) or more heads (fourth column). Likewise, the total number of flips after H must be less than or equal to \(2\). So the ratio of the two leads to an empirical probability that is strictly less than \(\frac{2}{4}\). This big constraint of a short run of flips over represents tails for a given amount of heads. We saw that as we increase the run length (num_flips in our code), the probability approaches the expected 50/50 distribution.

But why does increasing the number of experiments (N in our code) not work as per our expectation of the law of large numbers? The issue is that our "unit" of measurement is not a single flip, but rather independent short sequences of flips (in our case \(4\)). In this case, we just repeatedly run into this bias for each independent experiment we perform, regardless of how many times it is run. A different setup where each of the short runs were concatenated together would produce the 50/50 split our intuition expects effectively removing the bias.

Psychology of Gambler's Fallacy

One of the reasons why this bias is so insidious is that, as humans, we naturally tend to update our beliefs on finite sequences of observations. Imagine the roulette wheel with the electronic display. When looking for patterns, most people will just take a glance at the current 10 numbers and make a mental note of it. Five minutes later, they may do the same thing. This leads to precisely the bias that we saw above of using short sequences to infer the overall probability of a situation. Thus, the more "observations" they make, the strong the tendency to fall for the Gambler's Fallacy.

Of course, there are ways around making this mistake. As we saw, the most straight forward is to observe longer sequences. However, there's reason to believe that this is not practical given the limitations of human attention span and memory. Another method is to just do straight counts of the favorable outcomes and total outcomes (instead of computing interim probabilities after each "observation" like we did in our experiment), and then just compute the probability of this composite sample. This leads to the expected true long-run probability. Again, this bumps up against the limitations of human attention and memory. Probably the best way is to use external aids (e.g. pen and paper, computer) to properly record and compute the probability. Unfortunately, casinos are not as sympathetic to this solution.

Conclusion

Probability is far from a natural line of human thinking. Humans do have limited capacities in attention span and memory, which bias the observations we make and fool us into such fallacies such as the Gambler's Fallacy. Even with knowledge of probability, it is easy to be misled into an incorrect line of thinking. The best we can do is be aware of these biases and take extra measures to avoid them.

One of my favorite thinkers is Charlie Munger who espouses this line of thinking. He always has something interesting to say and so I'll leave you with one of his quotes:

If you don’t get elementary probability into your repertoire you go through a long life a one-legged man in an ass-kicking contest.

Notes

List of Notes: 1, 2, 3


  1. Miller, Joshua Benjamin and Sanjurjo, Adam, Surprised by the Gambler's and Hot Hand Fallacies? A Truth in the Law of Small Numbers (September 15, 2015). IGIER Working Paper #552. Available at SSRN: http://ssrn.com/abstract=2627354 or http://dx.doi.org/10.2139/ssrn.2627354

  2. Of course it's not really a law, especially since it is a fallacy.

  3. Check out this story: Las Vegas Roulette Wheel Stops on Same Number 7 Times in a Row. Imagine you were there when the wheel stopped on the same number for the sixth time. How tempted would you be to make a huge bet on it not coming up to that number on the seventh time?

I'm Brian Keng, a former academic, current data scientist and engineer. This is the place where I write about all things technical.

Twitter: @bjlkeng


Archive

Tags

RSS feed


Signup for Email Blog Posts