[Math Lair] Correlation and Causation

Math Lair Home > Topics > Correlation and Causation

An important principle in statistics is that correlation is not causation. An argument that makes such an assumption could be said to commit the fallacy of post hoc ergo propter hoc or the fallacy of cum hoc ergo propter hoc. Where there is a significant correlation between two variables A and B, there are four distinct possibilities:

  1. A causes B.
  2. B causes A.
  3. Both A and B are caused by another factor, C, or by multiple factors.
  4. It is just a coincidence.

See also clustering illusion.

To digress slightly, several years ago, there was a news article about a study that indicated that getting the flu shot halves the risk of heart attack or stroke in seniors. Now, a study showing that getting the flu shot reduced the risk of getting the flu wouldn't be at all surprising, because the flu shot is designed to prevent flu, but not necessarily heart attacks. Results such as this could be explained by quite a large number of things. You'll notice that possibilities (1), (2), (3), and (4) above are all represented below.

Obviously, some of the above possibilities can be eliminated pretty easily, others can be eliminated (and may well since have been eliminated) by further studies, but when you see news articles about studies, it can be useful to think in this manner rather than immediately jumping to the conclusion that A causes B instead of some other possibility.