39 Comparing proportions

Author

Karl Gregory

Now we consider comparing the proportions of “successes” in two populations. Letting \(p_1\) and \(p_2\) denote the probability of drawing a success from populations \(1\) and \(2\), respectively, suppose we draw independent random samples \[ X_{i1},\dots,X_{in_i} \overset{\text{ind}}{\sim}\text{Bernoulli}(p_i), \quad i = 1,2 \] from the two populations and define the sample proportions as \[ \hat p_i = \frac{1}{n_i}(X_{i1} + \dots + X_{in_i}), \quad i=1,2. \] We consider making inferences—constructing confidence intervals for and testing hypotheses about—the difference in proportions \(p_1 - p_2\) based on the difference of sample proportions \(\hat p_1 - \hat p_2\).

We first consider how to construct confidence intervals for \(p_1 - p_2\).

39.1 Confidence intervals for a difference in proportions

Since the sample proportions \(\hat p_1\) and \(\hat p_2\) are really just sample means, and due to the relation in Equation 29.2, we find that Proposition 38.6 gives the following central limit result for the Studentized difference in sample proportions:

Proposition 39.1 (Large-sample distribution of Studentized difference in sample proportions) Let \(X_{i1},\dots,X_{in_i} \overset{\text{ind}}{\sim}\text{Bernoulli}(p_i)\) for \(i=1,2\). Then \[ \frac{\hat p_1 - \hat p_2 - (p_1 - p_2)}{\sqrt{\dfrac{\hat p_1(1-\hat p_1)}{n_1} + \dfrac{\hat p_2(1-\hat p_2)}{n_2} }} \text{ behaves more and more like } Z \sim \mathcal{N}(0,1) \] for larger and larger \(n_1\) and \(n_2\).

The above result suggests how to construct an approximate \((1-\alpha)100\%\) confidence interval for the difference \(p_1 - p_2\) when \(n_1\) and \(n_2\) are large.

Proposition 39.2 (Large-sample confidence interval for a difference in proportions) Let \(X_{i1},\dots,X_{in_i} \overset{\text{ind}}{\sim}\text{Bernoulli}(p_i)\) for \(i=1,2\). Then the interval with endpoints \[ \hat p_1 - \hat p_2 \pm z_{\alpha/2} \sqrt{\dfrac{\hat p_1(1-\hat p_1)}{n_1} + \dfrac{\hat p_2(1-\hat p_2)}{n_2} } \] contains \(p_1 - p_2\) with probability closer and closer to \(1-\alpha\) for larger and larger \(n_1\) and \(n_2\).

Just as in the case of constructing a confidence interval for a single proportion (rather than for a difference in proportions), we will need a sample size requirement. In order to trust the above interval, we will require \[ \min\{n_1\hat p_1,n_1(1-\hat p_1),n_2\hat p_2,n_2(1-\hat p_2)\} \geq 15, \tag{39.1}\] which is that each sample should have at least \(15\) “successes” and \(15\) “failures”. Compare this condition with that in Equation 29.3.

Exercise 39.1 (Two deep bags of marbles) Suppose you have two deep bags of marbles, from each of which you scoop a sample of marbles and count the number of red marbles in the sample. Suppose the bags are deep enough (contain enough marbles) so that draws from the bag can be regarded as independent Bernoulli trials with probability of “success” equal to the proportion of red marbles in the bag. From the first bag you scoop \(143\) marbles and find \(30\) which are red; from the second bag you scoop \(213\) marbles and find \(49\) which are red. Letting \(p_1\) and \(p_2\) denote the proportions of red marbles in first and second bag, respectively, construct a confidence interval for \(p_1 - p_2\) at confidence level:

\(95\%\).¹
\(99\%\).²

39.2 Testing hypotheses for a difference in proportions

Here we will consider testing hypotheses about \(p_1 - p_2\) of the following forms:

\(H_0\): \(p_1 - p_2 \leq 0\) versus \(H_1\): \(p_1 - p_2 > 0\).
\(H_0\): \(p_1 - p_2 \geq 0\) versus \(H_1\): \(p_1 - p_2 < 0\).
\(H_0\): \(p_1 - p_2 = 0\) versus \(H_1\): \(p_1 - p_2 \neq 0\).

Note that we do not here, as we did in the case of testing hypotheses about a difference in means \(\mu_1 - \mu_2\), consider a null value \(\delta_0\) which could be, if desired, different from zero. It is certainly possible to test such hypotheses as \(H_0\): \(p_1 - p_2 \leq \delta_0\) versus \(H_1\): \(p_1 - p_2 > \delta_0\) for \(\delta_0 \neq 0\), for example, but we do not consider such hypotheses here. One reason for this is that \(\delta_0=0\) is by far the most typical choice of null value for the difference \(p_1 - p_2\); another reason is that setting \(\delta_0=0\) allows us to use a kind of “pooled variance” in our test statistic, as we will show.

Consider the fact that if \(p_1 - p_2 = 0\), then \(p_1 = p_2 = p\) for some common proportion \(p\). If the two populations have a common success probability \(p\), it would make sense to estimate this common success probability by pooling all the data together to obtain \[ \hat p = \frac{n_1 \hat p_1 + n_2 \hat p_2}{n_1 + n_2}, \tag{39.2}\] where the above is simply the total number of successes in both samples divided by the total sample size. Moreover, since the \(\text{Bernoulli}(p)\) distribution has variance \(p(1-p)\), \(p_1 = p_2 = p\) implies that the two populations have the same variance; this variance could be estimated with \(\hat p (1-\hat p)\). Our test statistic, presented in the next result describing tests of the above sets of hypotheses, makes use of this pooled estimate of the variance.

Proposition 39.3 (Tests of hypotheses for a difference in proportions) Let \(X_{i1},\dots,X_{in_i} \overset{\text{ind}}{\sim}\text{Bernoulli}(p_i)\) for \(i=1,2\) be independent random samples. Then, with \[ Z_{\operatorname{test}}= \frac{\hat p_1 - \hat p_2}{\sqrt{\hat p(1-\hat p)\Big(\dfrac{1}{n_1} + \dfrac{1}{n_2}\Big)}}, \] and \(\hat p\) as in Equation 39.2, the following tests have size closer and closer to \(\alpha\) for larger and larger \(n_1\) and \(n_2\):

For \(H_0\): \(p_1 - p_2 \leq 0\) versus \(H_1\): \(p_1 - p_2 > 0\), reject \(H_0\) if \(Z_{\operatorname{test}}> z_\alpha\).
For \(H_0\): \(p_1 - p_2 \geq 0\) versus \(H_1\): \(p_1 - p_2 < 0\), reject \(H_0\) if \(Z_{\operatorname{test}}< -z_\alpha\).
For \(H_0\): \(p_1 - p_2 = 0\) versus \(H_1\): \(p_1 - p_2 \neq 0\), reject \(H_0\) if \(|Z_{\operatorname{test}}| > z_{\alpha/2}\).

One can obtain \(p\)-values in the usual way: By finding the significance level \(\alpha^*\) which sets the critical value equal to the observed value of the test statistic \(Z_{\operatorname{test}}\).

Exercise 39.2 (Two deep bags of marbles continued) Based on the same scoops of marbles as in Exercise 39.1, (from the first bag you scoop \(143\) marbles and find \(30\) which are red and from the second bag you scoop \(213\) marbles and find \(49\) which are red) obtain \(p\)-values for testing these sets of hypotheses:

\(H_0\): \(p_1 - p_2 \leq 0\) versus \(H_1\): \(p_1 - p_2 > 0\).³
\(H_0\): \(p_1 - p_2 \geq 0\) versus \(H_1\): \(p_1 - p_2 < 0\).⁴
\(H_0\): \(p_1 - p_2 = 0\) versus \(H_1\): \(p_1 - p_2 \neq 0\).⁵

Interpret the \(p\) values.

We have \(\hat p_1 = 0.141\) and \(\hat p_2 = 0.23\) with \(n_1 = 143\) and \(n_2 = 213\) and \(\hat p_1 - \hat p_2 = -0.089\). With \(\alpha = 0.05\) we have \(z_{0.05/2} = 1.96\), and we obtain the interval \([-0.169,-0.009]\). ↩︎
With \(\alpha = 0.01\) we have \(z_{0.01/2} = 2.576\), and we obtain the interval \([-0.195,0.016]\).↩︎
We obtain \(\hat p = 0.194\) and the test statistic value \(Z_{\operatorname{test}}= -2.086\). The \(p\) values for testing the right-sided set of hypotheses is the area under the \(\mathcal{N}(0,1)\) PDF to the right of \(Z_{\operatorname{test}}\), which is \(0.981\). There are no grounds for rejecting \(H_0\).↩︎
The \(p\) values for testing the right-sided set of hypotheses is the area under the \(\mathcal{N}(0,1)\) PDF to the left of \(Z_{\operatorname{test}}\), which is \(0.019\). We reject the null hypothesis that \(p_1 \geq p_2\) and conclude \(p_1 < p_2\) at all significance levels greater than \(0.019\).↩︎
The \(p\) values for testing the two-sided set of hypotheses is twice the area under the \(\mathcal{N}(0,1)\) PDF beyond the value of \(Z_{\operatorname{test}}\) in the direction it lies from zero, which is \(0.037\). We reject the null hypothesis of equal proportions and conclude \(p_1 < p_2\) at all significance levels greater than \(0.037\).↩︎