An important part of statistics is its use to be able to quantify the significance of phenomena compared with what we might expect based on theory or chance. For example, say that we flip a coin 100 times, and we get 60 heads. Is this sufficient grounds to claim that the coin is probably biased? Or, say that we give a certain drug, drug A to 100 patients and 38 of them get better. We then give another drug, drug B, to 100 different patients and 45 of them get better. Is there enough evidence to reasonably state that drug B is superior to drug A?

This is where hypothesis testing comes in. The method of testing
hypotheses that is used most frequently today was put forth by Jerzy Neyman
and Egon Pearson in the 1930s. There are two competing hypotheses under the
Neyman-Pearson theory. There is a null hypothesis (denoted H_{0}),
and an alternative hypothesis (denoted H_{1}). Informally speaking,
H_{0} asserts that nothing unusual is going on that couldn't be
explained by chance or other known factors, and H_{1} asserts that
something different is taking place. So, if we were to test a coin to see
whether it is biased towards heads or not, we might formulate the following hypotheses:

- H
_{0}: The coin is not biased towards heads - H
_{1}: The coin is biased towards heads

Alternately, depending on the aim of the experiment, we could construct different hypotheses, as long as they were mutually exclusive.

We would formulate these hypotheses before performing the experiment,
so that our hypotheses are not affected by what we've found. If our result are close to 50% heads and 50% tails,
we wouldn't have enough confidence to reject H_{0}, so we would accept it.
If the number of heads is "large enough" ("large enough" would, of
course, be defined statistically depending on how confident we want to be
in our result) then we would reject H_{0} and accept H_{1}. We would also determine what degree of confidence we're looking for before performing the experiment.

Let's say that we flip our coin 100 times and get 60 heads. Whether we
accept or reject H_{0} depends on what degree of confidence we chose
before attempting the experiment. If the coin were fair, there should be
less than 60 heads about 97.2% of the time and 60 or more heads about 2.8%
of the time. So, if we had chosen a 95% confidence interval, we would have
enough evidence to reject the hypothesis that the coin is fair (97.2% > 95%)
so we would accept the hypothesis that the coin is unfair. On the other
hand, if we had chosen a 99% confidence interval, we wouldn't have enough
evidence to reject the null hypothesis (97.2% < 99%) so we could not
conclude that the coin is biased.

You might be able to see from the above discussion that there are two types of errors that we could make. We might conclude that the coin is unfair when it really is fair (and it was just by chance that we got so many heads), or we might conclude that the coin is fair when it really is unfair. These are called Type I and Type II errors.