Here we will consider testing for an “association” between sets of events. For example, if we consider drawing at random someone from the human population, we might ask, is there an association between the eye color and the hair color of the person. Asking if there is an “association” means asking whether knowing the eye color of a randomly selected person would help you guess the person’s hair color or vice versa.
We will work within the context of the following statistical experiment.
Definition 42.1 (Contingency table experiment) Define a contingency table experiment as an experiment with a sample space \(S\) consisting of pairs of outcomes \((a_i,b_j)\) for \(i = 1,\dots,R\) and \(j = 1,\dots,C\) so that \(S\) may be written as \[
S = \left\{\begin{array}{ccc}
(a_1,b_1)& \dots &(a_1,b_C)\\
\vdots&\ddots&\vdots\\
(a_R,b_1)& \dots & (a_R,b_C)
\end{array}
\right\},
\] where \(R\) is the number of rows and \(C\) the number of columns.
Drawing a person from the human population and recording his or her eye and hair color is a contingency table experiment in which \(a_1,\dots,a_R\) would symbolize, say, \(R\) different eye colors, and \(b_1,\dots,b_C\) would symbolize \(C\) different colors of hair. Each pair \((a_i,b_j)\) would thus symbolize an eye-color-hair-color combination.
On the sample space of a contingency table experiment, we can define events \(A_1,\dots,A_R\) such that \(A_i\) is the event that the person drawn has eye color \(i\). Likewise, we can define events \(B_1,\dots,B_C\) such that \(B_j\) is the event that the person drawn has hair color \(j\). Then the events \(A_i\), \(B_j\), and the intersection event \(A_i \cap B_j\) are identified with sets of sample outcomes as \[
\begin{align}
A_i &= \{(a_i,b_1),\dots,(a_i,b_C)\}\\
B_j &= \{(a_1,b_j),\dots,(a_R,b_j)\}\\
A_i \cap B_j & = \{(a_i,b_j)\}
\end{align}
\] for \(i=1,\dots,R\) and \(j = 1,\dots, C\).
Having defined the above events, we can state more precisely what we mean by testing for an “association” between the events \(A_1,\dots,A_R\) (say, eye color) and the events \(B_1,\dots,B_C\) (say, hair color). By asking whether there is an association between these collections of events, we are really asking whether pairs of events \(A_i\) and \(B_j\) are independent for every \(i\) and \(j\). Formally, recalling Definition 5.1 of independence between two events, we will collect data and measure evidence against the null hypothesis \[
\text{$H_0$: $P(A_i\cap B_j) = P(A_i)P(B_j)$ for all $i=1,\dots,R$ and $j = 1,\dots,C$}
\] in favor of its alternative, which says that there is at least one pair of events \(A_i\) and \(B_j\) which are not independent. We will call the foregoing null hypothesis the hypothesis of no association.
We collect data by running the contingency table experiment a large number of times, say \(N\) times, recording the number of times each outcome \((a_i,b_j)\) occurs. These counts we record in what we will call a contingency table: Letting \(O_{ij}\) be the number of times we observe outcome \((a_i,b_j)\), the contingency table takes the form \[
\begin{array}{c|ccc|c}
& B_1 & \dots & B_C & \text{Total} \\ \hline
A_1 & O_{11} & \dots & O_{1C} & O_{1.} \\
\vdots & \vdots & \ddots & \vdots & \vdots \\
A_R & O_{R1} & \dots & O_{RC} & O_{R.} \\ \hline
\text{Total} & O_{.1} & \dots & O_{.C} & N
\end{array}
\tag{42.1}\] where \(O_{i.} = \sum_{j=1}^C O_{ij}\) is the total number of times the event \(A_i\) was observed and \(O_{.j} = \sum_{i=1}^R O_{ij}\) is the total number of times the event \(B_j\) was observed, for \(i,\dots,R\) and \(j=1,\dots,C\). Here we use the convention of placing “.” at the position of the index over which the sum was taken. Summing these marginal counts gives \(N\), the total number of times the experiment was run.
Our test statistic for testing the hypothesis of no association is constructed by comparing the table of observed counts in Equation 42.1 to a table containing estimates of the counts one would expect if the null hypothesis of no association were true. These estimated expected counts are obtained as \[
E_{ij} = \frac{O_{i.}O_{.j}}{N}
\] for \(i=1 ,\dots,R\) and \(j =1,\dots,C\). We arrive at this expression for the expected counts by considering the following: The expected value of the count \(O_{ij}\) is \(N P(A_j\cap B_j)\), since \(O_{ij} \sim \text{Binomial}(N,P(A_i \cap B_j))\). If \(A_i\) and \(B_j\) are independent events, then we have \(P(A_j \cap B_j) = P(A_i)P(B_j)\), as stated in the hypothesis of no association. Under no association, the expected value of \(O_{ij}\) is given by \(N P(A_i)P(B_j)\). Now, if we plug in for \(P(A_i)\) the estimate \(O_{i.}/N\) and for \(P(B_j)\) the estimate \(O_{.j}/N\), we obtain our estimate \(E_{ij}\) of the expected count under no association. These expected counts \(E_{ij}\) can be arranged in a table as \[
\begin{array}{c|ccc|c}
& B_1 & \dots & B_C & \text{Total} \\ \hline
A_1 & E_{11} & \dots & E_{1C} & O_{1.} \\
\vdots & \vdots & \ddots & \vdots & \vdots \\
A_R & E_{R1} & \dots & E_{RC} & O_{R.} \\ \hline
\text{Total} & O_{.1} & \dots & O_{.C} & N
\end{array}
\tag{42.2}\] where we note that the row and column totals remain the same (you can verify that \(\sum_{j=1}^C E_{ij} = O_{i.}\) and \(\sum_{i=1}^C E_{ij} = O_{.j}\)).
Now we construct a test statistic for the null hypothesis of no association which is based on summing up the differences between the observed counts \(O_{ij}\) and the expected counts \(E_{ij}\).
Proposition 42.1 (Asymptotic likelihood ratio test for no association) Based on running the contingency table experiment \(N\) times, define the test statistic \[
L_{\operatorname{test}}= 2 \sum_{i=1}^R\sum_{j=1}^C O_{ij}\log\Big(\frac{O_{ij}}{E_{ij}}\Big).
\tag{42.3}\] Then the test which rejects \(H_0\) when \(L_{\operatorname{test}}> \chi^2_{(R-1)(C-1),\alpha}\) has size closer and closer to \(\alpha\) for larger and larger \(N\).
A rule of thumb for knowing when \(N\) is large enough for the test in Proposition 42.1 to be trusted (for it to make Type I errors at a rate close to the advertised rate of \(\alpha\)) is to require \[
E_{ij} \geq 5
\] for all \(i=1,\dots,R\) and \(j = 1,\dots,C\). That is, all the expected counts should be at least \(5\).
Here is an example:
Example 42.1 (Hair and eye color)Snee (1974) observed the below counts of eye and hair color combinations in a survey of \(592\) students. It is of interest to test for an association between eye color and hair color. \[
\begin{array}{c|cccc|c}
& \text{Black} & \text{Brown} & \text{Red} & \text{Blond} & \text{Total}\\ \hline
\text{Brown} & 68 & 119 & 26 & 7 & 220\\
\text{Blue} & 20 & 84 & 17 & 94 & 215\\
\text{Hazel} & 15 & 54 & 14 & 10 & 93\\
\text{Green} & 5 & 29 & 14 & 16 & 64\\\hline
\text{Total} &108&286&71& 127 & 592
\end{array}
\]
We obtain the value \(L_{\operatorname{test}}= 146.444\). Moreover, the smallest value of \(E_{ij}\) in the table of expected counts is \(7.676\) so the rule of thumb is satisfied for using the asymptotic likelihood ratio test of no association. Since \(R=4\) and \(C = 4\), we have \((R-1)(C-1) = 9\), so the critical value at significance level \(0.05\) is \(\chi^2_{9,0.05} = 16.919\), which can be obtained from the chi-squared table in Chapter 45. Moreover, the p value, obtained as the area under the \(\chi^2_{9}\) PDF beyond the value of \(L_{\operatorname{test}}\), is approximately zero. We therefore reject the null hypothesis and conclude that there is an association between hair color and eye color.
One often sees, instead of the test statistic \(L_{\operatorname{test}}\) defined in Equation 42.3 the test statistic \[
W_{\operatorname{test}}= \sum_{i=1}^R\sum_{j=1}^C \frac{(O_{ij} - E_{ij})^2}{E_{ij}},
\] which is called Pearson’s chi-squared statistic. The value of \(W_{\operatorname{test}}\) will be nearly equal to \(L_{\operatorname{test}}\) due to a Taylor expansion of the natural logarithm, so we may use \(W_{\operatorname{test}}\) and \(L_{\operatorname{test}}\) interchangeably (for the eye and hair color data we obtain \(W_{\operatorname{test}}= 138.29\)). The chisq.test() function in R performs the test of no association based on \(W_{\operatorname{test}}\).
42.1 Comparing several populations
One may be interested not in finding an association between sets of events \(A_1,\dots,A_R\) and \(B_1,\dots,B_C\) as above, but rather in testing for differences in sets of proportions across multiple populations. That is, if we draw samples of size \(n_1,\dots,n_R\) from \(R\) populations, where each population consists of \(C\) different outcomes, then we can summarize the collected data in a table as below, where \(O_{ij}\) is the number of times outcome \(j\) is observed in the sample from population \(i\):
In this setting, it is possible that one has fixed in advance the sample sizes \(n_1,\dots,n_R\). Because of this, it no longer makes sense to test for independence between drawing from population \(i\) and observing outcome \(j\). We instead test whether the probabilities of outcomes \(1,\dots,C\) are the same in all \(R\) populations.
Let \(p_{ij}\) denote the probability of outcome \(j\) in population \(i\) for \(j=1,\dots,C\) and \(i=1,\dots,R\). Then we wish to test \[
\text{$H_0$: $p_{1j} = \dots = p_{Rj}$ for $j=1,\dots,C$},
\] which states that for each outcome \(j\), the probability of that outcome is the same in all \(R\) populations, against its alternative.
It turns out that we can use exactly the same test in this setup as in the previous section with the expected counts constructed as \[
E_{ij} = n_i \frac{O_{.j}}{N}
\] for \(i=1,\dots,R\) and \(j = 1,\dots,C\) (note that these are the same as in the previous section, but with \(n_i\) replacing \(O_{i.}\)). We can justify this expression for \(E_{ij}\) as follows: Under the null hypothesis, the probability of outcome \(j\) is the same in all populations, so to estimate this common probability, one may pool all the population counts together, obtaining the estimate \(O_{.j}/N\). Then, multiplying this by \(n_i\), the number of observations drawn from population \(i\), results in an estimate of the expected number of times one would, if the null hypothesis were true, observe outcome \(j\) among the \(n_i\) draws from population \(i\).
Here is an example:
Example 42.2 (Ice cream preferences) In a survey of sex and ice cream preferences, 84 males and 85 females (presumably human!) were asked to choose a favorite from among four flavors of ice cream, resulting in the below table of counts: \[
\begin{array}{c|cccc|c}
& \text{Chocolate} & \text{Vanilla} & \text{Strawberry} & \text{Coffee} & \text{Total} \\ \hline
\text{Male} & 25 & 15 & 15 & 29 & 84 \\
\text{Female} & 35 & 20 & 10 & 20 & 85\\\hline
& 60 & 35 & 25 & 49 & 169
\end{array}
\]
For these data we obtain \(L_{\operatorname{test}}= 5.055\) or \(W_{\operatorname{test}}= 5.028\) and a minimum of expected counts \(E_{ij}\) equal to \(12.426\). Since \(R= 2\) and \(C = 4\), we have \((R-1)(C-1) = 3\), so the critical value at significance level \(\alpha = 0.05\) is \(\chi^2_{3,0.05} = 7.815\). The p value, obtained as the area under the \(\chi^2_{3}\) PDF to the right of \(L_{\operatorname{test}}\) is \(0.168\). We therefore fail to reject the null hypothesis; these data do not carry strong evidence of an association between sex and ice cream preferences.
Snee, Ronald D. 1974. “Graphical Display of Two-Way Contingency Tables.”The American Statistician 28 (1): 9–12.