30  Hypothesis testing

Author

Karl Gregory

So far we have studied how to learn from a random sample \(X_1,\dots,X_n\) about a population mean \(\mu\) or proportion \(p\) through the construction of confidence intervals. Another way in which statisticians learn about the unknown values of these population parameters is by formulating and testing hypotheses.

For our purposes, a hypothesis will be a statement about the value of a population mean \(\mu\) or proportion \(p\), the plausibility or implausibility of which we would like to assess based on the data in our random sample. Given such a statement, we weigh evidence against the statement, where we regard evidence against the statement as evidence in favor of a contradictory statement. We will thus consider two contradictory statements or hypotheses: A null hypothesis and its alternate hypothesis.

The null hypothesis will typically be formulated such that if one deems it implausible based on observed data, one will claim to have made a new discovery. For example, if it is generally believed that baby tortoises have an average weight of eighty grams, but you have a hunch that the average weight actually differs from this, you may form a null hypothesis stating \(\mu = 80\). The alternate hypothesis is always the logical negation of the null hypothesis, so in this case it would be \(\mu \neq 80\). After collecting data, you would judge how much evidence the data carry against the statement \(\mu = 80\). If the evidence is strong, you will reject the statement and conclude \(\mu \neq 80\). If the evidence is weak, however, you will not reject \(\mu=80\) as a hypothesis.

We will denote the null hypothesis by \(H_0\) and the alternate by \(H_1\), so that your null and alternate hypotheses concerning the mean weight of baby tortoises could be presented as \[ \text{$H_0$: $\mu = 80$ versus $H_1$: $\mu \neq 80$.} \] We read \(H_0\) as “\(H\) nought.”

Another example would be testing whether a coin is fair or whether it favors “heads.” If you have a suspicion that the coin favors heads, you could set up null and alternate hypotheses as \[ \text{$H_0$: $p \leq 1/2$ versus $H_1$: $p > 1/2$.} \] where \(p\) denotes the probability of the coin turning up heads. The alternate hypothesis is usually formulated as that which one is trying to establish, or that for which one wishes to collect evidence in favor, while the null hypothesis is simply the logical negation of this. The coin’s favoring heads (what we are trying to establish) corresponds to \(p > 1/2\), so this goes in the alternate hypothesis. The opposite claim is \(p \leq 1/2\), which is the correponding null hypothesis. After collecting data by flipping the coin, say, a hundred times, you will evaluate the plausibility of \(p \leq 1/2\). If you assess the plausibility to be low, you will conclude that the alternate hypothesis is true; if you do not assess the plausibility to be low, you will conclude \(p \leq 1/2\) may very well be true, so you cannot claim that the coin favors heads.

Once the null and alternate hypotheses have been formulated and data has been collected, the statistician will either

  1. Reject \(H_0\), or
  2. Fail to reject \(H_0\).

While “failure to reject \(H_0\)” is common parlance for choosing, in light of the data, not to reject the null hypothesis, one should not consider an experiment to have “failed” if it does not produce data leading to the rejection of the investigator’s null hypothesis. In reference to the coin, if you flip it one hundred times and do not find evidence that it favors heads, you can say you “failed” to show that the coin favors heads; however, if the coin is indeed fair than this “failure” to reject the null hypothesis is in fact the correct inference.

Deciding from the sample data whether to reject or not to reject the null hypothesis is referred to as testing the null hypothesis. In order to test the null hypothesis, we will compute from our random sample a quantity called a test statistic. From the test statistic we will judge the plausibility of the null hypothesis in light of the data. This will entail defining a rule which tells us when to reject \(H_0\) and when not to. We will refer to this rule as a rejection rule. A rejection rule always takes the form: Reject \(H_0\) if the test statistic takes a value in the set \(R\), where \(R\) is a set of real numbers called the rejection region. So in end, we will reject the null hypothesis if our test statistic lies in the rejection region, which is to say it satisfies the rule for rejecting \(H_0\). This will become less abstract as we introduce specific examples.

In the following pages we will see many examples of null and alternate hypotheses. One rule to remember is that if a statement contains an equality, it belongs in the null hypothesis. We will try to make the reasons for this clear as we go along.

First we will consider testing hypotheses about a mean \(\mu\). Then we will move on to testing hypotheses about a proportion \(p\).