Correlation and Causation (Math Lair)

Math Lair Home > Topics > Correlation and Causation

An important principle in statistics is that correlation is not causation. An argument that makes such an assumption could be said to commit the fallacy of post hoc ergo propter hoc or the fallacy of cum hoc ergo propter hoc. Where there is a significant correlation between two variables A and B, there are four distinct possibilities:

A causes B.
B causes A.
Both A and B are caused by another factor, C, or by multiple factors.
It is just a coincidence.

See also clustering illusion.

To digress slightly, several years ago, there was a news article about a study that indicated that getting the flu shot halves the risk of heart attack or stroke in seniors. Now, a study showing that getting the flu shot reduced the risk of getting the flu wouldn't be at all surprising, because the flu shot is designed to prevent flu, but not necessarily heart attacks. Results such as this could be explained by quite a large number of things. You'll notice that possibilities (1), (2), (3), and (4) above are all represented below.

Does some ingredient in the vaccine help to protect against heart disease?
Does the flu virus cause heart disease, so a vaccine providing immunity against the flu would indirectly reduce heart disease?
Does getting influenza put a strain on seniors' hearts, resulting in heart problems?
Do seniors who already have severe heart problems find it difficult to go out to get the vaccine?
Do seniors who don't get the vaccine take a more cavalier attitude about their health in general?
Are seniors who visit the doctor more regularly more likely to have their doctor give them the vaccine?
Does getting the vaccine put seniors in touch with health professionals who may see the signs of heart trouble and help avert it?
Does getting the vaccine make seniors less anxious, which puts less strain on their hearts?
Does getting the vaccine make seniors feel better and just think they're having fewer heart problems?
Are the results statistically significant, but of little or no practical significance?
Is there really no correlation, but the sample chosen just happened to be highly atypical?
Is no correlation indicated in the statistics, but they were inadvertently misinterpreted?
Is no correlation indicated in the statistics, but they were misinterpreted on purpose?
Was the experiment inadvertently conducted with a bias towards the results obtained?
Is someone just fudging the data?

Obviously, some of the above possibilities can be eliminated pretty easily, others can be eliminated (and may well since have been eliminated) by further studies, but when you see news articles about studies, it can be useful to think in this manner rather than immediately jumping to the conclusion that A causes B instead of some other possibility.