[Math Lair] The Clustering Illusion

Math Lair Home > Topics > The Clustering Illusion

The nature of randomness can be unintuitive or even confusing. One related error that people often make is called the clustering illusion. To illustrate this, I plotted 100 points at random in a 10×10 grid:

100 random dots

Note that there are 100 dots and 100 squares. However, the dots are not spread evenly, one per square, throughout the grid; that wouldn't be random at all. There are collections of dots that the human eye picks out as patterns. For example, you might notice several diagonal trends in the dots. There are also clumps. For example, the fifth square from the left and fourth square from the bottom contains five dots, with a sixth very close. On the other hand, there's a 3×4 rectangle on the right-hand side and a 7×2 rectangle at the top, both of which contain a total of six dots.

Now, imagine that the above plot represents a map of a city, and each of these dots represents something bad like violent crime, or a car accident, or some type of disease. Now, where in the city would you like to live? Would you like to live in the square five from the left and four from the bottom (let's call it "square (5,4)" for short), or would you rather live on the north (top) or east (right) side of town?

In reality, no area of town is safer than any other; these events were simply random and could have occurred anywhere. Because "random" doesn't mean "spread out evenly," some area of town is going to have more crime, or accidents, or disease, and some area is going to have less. However, what if I hadn't told you that these points were generated randomly? Would you think that square (5,4) is a "high-crime area," or that it has unsafe streets, or that pollution or something is causing disease? If the city council saw this map, would they put programs together to try to solve the problems in square (5,4)? This is what the clustering illusion is—seeing significant patterns or clumps in data that are really random.

Another example of the clustering illusion occurred during World War II, when the Germans bombed South London. Some areas were hit several times and others not at all. People thought that the areas that weren't hit were home to German spies; however, when William Feller analyzed the statistics after the war, he found the hits to be distributed randomly.

Arguing, solely on the basis of a cluster, that that cluster is caused by something (e.g. that disease is being caused by polluted water or that accidents are caused by poorly-maintained roads) could be an instance of the Texas sharpshooter fallacy. This fallacy takes its name from a story of a Texas marksman. A person travelling through Texas notices a barn with several bulls-eyes painted on it, each of which has several bullet-holes near the centre. The traveller asks the owner of the barn how he became such a good shot. "Oh, it's easy," the Texan said, "I just shoot first and paint the bulls-eye afterwards." It's easy to notice a pattern first and then argue based on the pattern.

The clustering illusion is also related to something called the "Belief in the Law of Small Numbers," which is a psychological bias discovered by Daniel Kahneman and Amos Tversky. The law of large numbers tells us that, as the number of trials of a random experiment increases, the results will approach the expected value. Belief in the law of small numbers is an error in human judgement where people assume that results of a small amount of trials will also approach the expected value. So, if we flip a coin six times, we expect to get three heads and three tails, even though the probability of getting that result is just 31.25%. We think that a string of heads and tails such as HTHTHT is more likely and more random than a string such as HHHHHH, even though each individual string is as likely to come up as any other.

A dramatic illustration of the belief in the law of small numbers occurred on August 18, 1913, when, at Monte Carlo, an unbiased roulette wheel came up black 26 times in a row. The odds against that are (on a wheel with 37 pockets) 1 in 136,823,184, which is quite unlikely, but, given the number of roulette wheels in existence, not entirely outside the realm of possibility. What did the gamblers do during this streak? After black had come up 15 times in a row or so, they began betting prodigious amounts on red, in the expectation that it was "due" in order to even out the difference between black and red, forgetting that, on any individual spin, black and red are equally likely. In the end, the casino made millions of francs on the streak.

See also: Birthday Paradox.