33 Asymptotic null distribution of the Wilcoxon rank sum statistic

Here we will state and prove a result describing the asymptotic behavior of the Wilcoxon rank sum statistic $W_R$ as the sample sizes $n$ and $m$ grow. We begin by stating a result giving its mean and variance.

Proposition 33.1 (Mean and variance of the rank-sum statistic) The rank-sum statistic $W_R$ must take a value between $m(m+1)/2$ and $mn + m(m+1)/2$. Moreover, if $F = G$ we have \[\begin{align} \mathbb{E}W_R &= \frac{1}{2}m(N + 1)\\ \mathbb{V}W_R & = \frac{1}{12}mn(N+1). \end{align}\]

Proof of Proposition 33.1

First, $W_R$ takes its smallest value when $Y_1,\dots,Y_m$ are the smallest $m$ observations among $Z_1,\dots,Z_{n+m}$. In this case the set of ranks $\{R_1,\dots,R_m\}$ is equal to the set of integers $\{1,\dots,m\}$ so that $\sum_{j=1}^m R_j = m(m+1)/2$. On the other hand, $W_R$ takes its largest value when $Y_1,\dots,Y_m$ are the largest $m$ observations among $Z_1,\dots,Z_{n+m}$, in which case the set of ranks $\{R_1,\dots,R_m\}$ is equal to the set of integers $\{n+1,\dots,n+m\}$ the sum of which is $mn + m(m+1)/2$.

To compute the expected value and variance of $W_R$, we first note that under $H_0$ the position of a data point $Y_j$ in the ordering of the $N = n + m$ data points is a random draw from the integers $1,\dots,N$ such that each position occurs with probability $1/N$. That is, for any $r_j =1,\dots,N$ we have $\mathbb{P}(R_j = r_j) = 1/N$, so that \[ \mathbb{E}R_j = \frac{N+1}{2} \quad \text{ and } \quad \mathbb{V}R_j = \frac{N^2 - 1}{12}, \] where these are simply the mean and variance of the discrete uniform distribution on the integers $1,\dots,N$.

From here we obtain \[ \mathbb{E}W_R = \mathbb{E}\sum_{j=1}^m R_j = \frac{1}{2}m(N+1). \]

To find $\mathbb{V}W_R$ we write \[\begin{align} \mathbb{V}W_R &= \mathbb{V}\Big(\sum_{j=1}^m R_j\Big) \\ &= \sum_{j=1}^m \mathbb{V}R_j + 2\sum_{j < k} \mathbb{C}(R_j,R_k) \\ &= m \mathbb{V}R_1 + m(m-1)\mathbb{C}(R_1,R_2), \end{align}\] since $\mathbb{C}(R_j,R_k)$ is the same for every $j \neq k$. In order to find $\mathbb{C}(R_1,R_2)$, assume for a moment that $m = N$. In this case we must have $W_R = N(N+1)/2$ so $\mathbb{V}W_R = 0$, giving \[ N \mathbb{V}R_1 + N(N-1)\mathbb{C}(R_1,R_2) = 0, \] so that \[ \mathbb{C}(R_1,R_2) = -\frac{N \mathbb{V}R_1 }{N(N-1)} = -\frac{N^2-1}{12(N-1)} = -\frac{N+1}{12}. \] Plugging this into our earlier expression for $\mathbb{V}W_R$, we obtain \[\begin{align} \mathbb{V}W_R &= m \frac{N^2-1}{12} - m(m-1)\frac{N+1}{12} \\ &= m \frac{(N+1)(N-1)}{12} - m(m-1)\frac{N+1}{12} \\ &= \frac{1}{12}mn(N+1). \end{align}\]

Now we state a central limit result for $W_R$.

Theorem 33.1 (Asymptotic Normality of rank sum statistic under the null) Under $H_0$: $F = G$ we have \[ \frac{W_R - \mathbb{E}W_R}{\sqrt{\mathbb{V}W_R}} \overset{\text{d}}{\longrightarrow}\mathcal{N}(0,1) \tag{33.1}\] as $n,m \to \infty$.

Proof of Theorem 33.1

The main complication in proving the result is that $W_R = R_1,\dots,R_m$ is a sum of dependent random variables rather than independent random variables. Our strategy will be to

Introduce an approximation $\tilde W_R$ to $W_R$ which is a sum of independent random variables and such that we can show \[ \frac{\tilde W_R - \mathbb{E}\tilde W_R}{\sqrt{ \mathbb{V}\tilde W_R}} \overset{\text{d}}{\longrightarrow}\mathcal{N}(0,1) \tag{33.2}\] as $n,m \to \infty$.
Show that the approximation of $\tilde W_R$ to $W_R$ is close enough for the convergence in Equation 33.2 to imply the convergence in Equation 33.1.

Here we go. Under the null hypothesis $H_0$: $F = G$, the ranks $R_1,\dots,R_m$ are simply $m$ integers sampled without replacement from $1,\dots,N$. One way to draw a sample of size $m$ without replacement from the integers $1,\dots,N$ is to (i) generate $N$ independent realizations $U_1,\dots,U_N$ from the $\mathcal{U}(0,1)$ distribution, (ii) denote their order statistics by $U_{(1)} <\dots <U_{(N)}$, and (iii) keep the integers in the set $\{i: U_i \leq U_{(m)}\}$. Under this sampling scheme we find we may write \[ W_R = \sum_{i=1}^N i J_i, \] where $J_i = \mathbf{1}(U_i \leq U_{(m)})$ is an indicator equal to $1$ if the integer $i$ was drawn and equal to zero otherwise for each $i=1,\dots,N$. Note that $J_1,\dots,J_N$ are dependent random variables.

Now, to construct an approximation $\tilde W_R$ to $W_R$, we consider replacing the dependent random variables $J_i$ with independent analogues $K_i$ defined as $K_i = \mathbf{1}(U_i \leq m/N)$ for $i=1,\dots,N$, where the $U_1,\dots,U_N$ are the same $\mathcal{U}(0,1)$ realizations with which the $J_i$ were defined. Then we set \[ \tilde W_R = \sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)K_i + \frac{1}{2}m(N+1) \] Now, noting that $\mathbb{E}K_i = m/N$ and $\mathbb{V}K_i = m/N(1 - m/N)$, we have \[\begin{align} \mathbb{E}\tilde W_R &=\sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)\frac{m}{N} + \frac{1}{2}m(N+1) \\ &= \frac{1}{2}m(N+1) \end{align}\] and \[\begin{align} \mathbb{V}\tilde W_R &= \sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)^2 \frac{m}{N}\Big(1 - \frac{m}{N}\Big) =\frac{N-1}{N}\mathbb{V}W_R, \end{align}\] where the second equality will be useful later and requires some work to show. Now we find we may write \[\begin{align} \frac{\tilde W_R - \mathbb{E}\tilde W_R}{\sqrt{\mathbb{V}\tilde W_R}} = \frac{\sum_{i=1}^N a_i \xi_i}{\sqrt{\sum_{i=1}^N a_i^2}}, \end{align}\] where \[ a_i = i - \frac{N+1}{2} \quad \text{ and } \quad \xi_i = \frac{K_i - m/N}{\sqrt{m/N(1-m/N)}} \] for $i=1,\dots,N$. Note that $\xi_1,\dots,\xi_N$ are independent random variables with zero mean and unit variance, so according to the corollary to the Lindeberg central limit theorem stated in Corollary 35.1, the convergence in Equation 33.2 holds provided \[ \frac{\max_{1\leq i \leq N} |a_i|}{\sqrt{\sum_{i=1}^Na_i^2}} \to 0 \tag{33.3}\] as $n,m \to \infty$. We have \[ \max_{1 \leq i \leq N} |a_i| = \Big( \Big|1 - \frac{N+1}{2}\Big| \vee \Big|N - \frac{N+1}{2}\Big|\Big) = \frac{N-1}{2} \] and we can (with some work) show \[ \sum_{i=1}^N a_i^2 = \frac{N(N^2 - 1)}{12}. \] Thus we see that Equation 33.3 holds and the convergence in Equation 33.2 is established.

Now we consider the quality of the approximation of $\tilde W_R$ to $W_R$. We may write \[ \frac{W_R - \mathbb{E}W_R}{\sqrt{\mathbb{V}W_R}} = \Big(\frac{\tilde W_R - \mathbb{E}\tilde W_R}{\sqrt{\mathbb{V}\tilde W_R}} + \frac{W_R - \tilde W_R}{\sqrt{\mathbb{V}\tilde W_R}} \Big)\sqrt{\frac{\mathbb{V}\tilde W_R}{\mathbb{V}W_R}}, \] since $\mathbb{E}\tilde W_R = \mathbb{E}W_R$. Now, since $\mathbb{V}\tilde W_R = N^{-1}(N-1)\mathbb{V}W_R$, a sufficient condition for the convergence in Equation 33.2 to imply the convergence in Equation 33.1 is the condition \[ \frac{\mathbb{E}(W_R - \tilde W_R)^2}{\mathbb{V}\tilde W_R} \to 0 \tag{33.4}\] as $n,m \to 0$.

First, it will be useful to write \[\begin{align} W_R - \tilde W_R &= \sum_{i=1}^N i J_i - \sum_{i=1}^N\Big(i - \frac{N+1}{2}\Big)K_i - \frac{m(N+1)}{2} \\ &= \sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)(J_i - K_i), \end{align}\] where we have used the fact that $\sum_{i=1}^N J_i = m$.

Now, since $\mathbb{E}W_R = \mathbb{E}\tilde W_R$ we have $\mathbb{E}(W_R - \tilde W_R)^2 = \mathbb{V}(W_R - \tilde W_R)$, so \[ \mathbb{E}(W_R - \tilde W_R)^2 = \mathbb{V}(\mathbb{E}[W_R - \tilde W_R | U_{(1)},\dots,U_{(N)}]) + \mathbb{E}(\mathbb{V}[W_R - \tilde W_R | U_{(1)},\dots,U_{(N)}]), \tag{33.5}\] where \[\begin{align} \mathbb{E}[W_R - \tilde W_R | U_{(1)},\dots,U_{(N)}] &= \mathbb{E}\Big[\sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)(J_i - K_i)| U_{(1)},\dots,U_{(N)}\Big]\\ &= \sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)\mathbb{E}[(J_1 - K_1)| U_{(1)},\dots,U_{(N)}]\\ &= 0, \end{align}\] and \[ \mathbb{V}[W_R - \tilde W_R | U_{(1)},\dots,U_{(N)}] = \mathbb{V}\Big[ \sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)(J_i - K_i) | U_{(1)},\dots,U_{(N)}\Big]. \] To find the above variance, it helps to note that after conditioning on $U_{(1)},\dots,U_{(N)}$ the values $U_1,\dots,U_N$ on which $K_1,\dots,K_N$ and $J_1,\dots,J_N$ are based are a random permutation of $U_{(1)},\dots,U_{(N)}$. We can therefore use Lemma 35.2 to write \[\begin{align} \mathbb{V}[W_R - \tilde W_R | U_{(1)},\dots,U_{(N)}] &= \frac{1}{N-1} \sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)^2\sum_{i=1}^N((J_i - K_i) - (\bar J - \bar K))^2 \\ &\leq \frac{1}{N-1} \sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)^2\sum_{i=1}^N(J_i - K_i)^2, \end{align}\] where $\bar J = N^{-1}\sum_{i=1}^N J_i$ and $\bar K =N^{-1}\sum_{i=1}^N K_i$, since the mean minimizes the least-squares criterion. Plugging the above result into Equation 33.5, we have \[ \mathbb{E}(W_R - \tilde W_R)^2 \leq \frac{1}{N-1} \sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)^2 \mathbb{E}\sum_{i=1}^N (J_i - K_i)^2. \] In order to take the expectation, let $K_{(1)},\dots,K_{(N)}$ and $J_{(1)},\dots,J_{(N)}$ denote $K_1,\dots,K_N$ and $J_1,\dots,J_N$ when re-ordered according to the order of $U_1,\dots,U_N$, so that $K_{(i)} = \mathbf{1}(U_{(i)}\leq m/N)$ and $J_{(i)} = \mathbf{1}(U_{(i)}\leq U_{(m)})$. Now, letting $D = \sum_{i=1}^N\mathbf{1}(U_i \leq m/N)$, we may write \[ K_{(i)} = \left\{\begin{array}{ll} 1,& i = 1,\dots,D\\ 0,& i = D+1,\dots,N \end{array}\right. \quad \text{ and } \quad J_{(i)} = \left\{\begin{array}{ll} 1,& i = 1,\dots,m\\ 0,& i = m+1,\dots,N, \end{array}\right. \] so if $D < m$ we have \[ \sum_{i=1}^N(J_{(i)} - K_{(i)})^2 =\sum_{i=1}^D(J_{(i)} - K_{(i)})^2+\sum_{i=D+1}^m(J_{(i)} - K_{(i)})^2+\sum_{i=m}^N(J_{(i)} - K_{(i)})^2 = m - D \] and if $D > m$ we have \[ \sum_{i=1}^N(J_{(i)} - K_{(i)})^2 =\sum_{i=1}^m(J_{(i)} - K_{(i)})^2+\sum_{i=m+1}^D(J_{(i)} - K_{(i)})^2+\sum_{i=D}^N(J_{(i)} - K_{(i)})^2 = D - m, \] and if $D = m$ we have $\sum_{i=1}^N(J_{(i)} - K_{(i)})^2 = 0$. Putting these cases together and using the fact $\sum_{i=1}^N (J_i - K_i)^2 = \sum_{i=1}^N (J_{(i)} - K_{(i)})^2$ gives \[ \sum_{i=1}^N (J_i - K_i)^2 = |D - n|, \] where $D \sim \mathcal{B}(N,m/N)$, so, using Jensen’s inequality, we have \[ \mathbb{E}\sum_{i=1}^N (J_i - K_i)^2 = \mathbb{E}|D - m| \leq \sqrt{\mathbb{E}(D - m)^2}=\sqrt{Nm/N(1-m/N)}. \] From here we write \[\begin{align} \frac{\mathbb{E}(W_R - \tilde W_R)^2}{\mathbb{V}\tilde W_R} &\leq \frac{(N-1)^{-1} \sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)^2 \sqrt{Nm/N(1-m/N)}}{\sum_{i=1}^N \Big(i - \frac{N+1}{2}\Big)^2 m/N\Big(1 - m/N\Big) }\\ &= \frac{N}{N-1}\sqrt{\frac{N}{m(N-m)}}\\ &\leq \left\{\begin{array}{ll} N(N-1)^{-1}\sqrt{2/n},& m \leq N/2\\ N(N-1)^{-1}\sqrt{2/m},& m > N/2. \end{array}\right.\\ &\to 0 \end{align}\] as $n,m\to \infty$. This completes the proof.

According to Theorem 33.1, the rule which rejects $H_0$: $F = G$ when \[ W_R \geq \mathbb{E}W_R + z_\alpha \sqrt{\mathbb{V}W_R}, \] which is equivalent to \[ W_{XY} \geq \frac{1}{2}nm + z_\alpha \sqrt{\frac{1}{12}m(N - m)(N+1)}, \tag{33.6}\] will have Type I error rate converging to $\alpha$ as $n,m\to \infty$ (Recall from the previous section that $W_{XY} = W_R - m(m+1)/2$, so $\mathbb{E}W_{XY} = nm/2$ and $\mathbb{V}W_{XY} = \mathbb{V}W_R$). A “continuity correction” is often employed, whereby one adds $1/2$ to the right hand side of Equation 33.6.

The corresponding p-value is given by \[ \text{p-value }\approx 1 - \Phi\Big(\frac{W_{XY} - nm/2}{\sqrt{(m(N-m)(N+1)/12}}\Big), \] to which one can apply the continuity correction by subracting $1/2$ from the numerator of the argument in $\Phi(\cdot)$.

Figure Figure 33.1 gives an idea of how quickly the convergence in Theorem 33.1 takes place as $n,m \to \infty$. It plots the exact Type I error rate of the rejection rule in Equation 33.6 over an increasing sequence of $N$ values, with $m = \lfloor N/3 \rfloor$ for each $N$, at $\alpha = 0.01,0.05,0.10$, with and without the “continuity correction.” We see that as $N$ grows large, the rejection rule based on the asymptotic distribution has Type I error rates approaching the nominal levels. The continuity correction appears to make the Type I error rate significantly closer to the nominal level for smaller values of $N$. Note that this plot is not the result of a simulation; each curve is computed according to the exact distribution of the statistic $W_{XY}$ under $H_0$: $F = G$ with the function pwilcox().

Code

NN <- 6:50
typ1_01_exact <- numeric(length(NN))
typ1_05_exact <- numeric(length(NN))
typ1_10_exact <- numeric(length(NN))
typ1_corr_01_exact <- numeric(length(NN))
typ1_corr_05_exact <- numeric(length(NN))
typ1_corr_10_exact <- numeric(length(NN))
for(i in 1:length(NN)){

  N <- NN[i]
  m <- floor(N/3)
  n <- N - m
  mu <- n*m/2
  sigma <- sqrt(m*(N-m)*(N+1)/12)
  
  c01_asymp <- mu + qnorm(0.99)*sigma
  c05_asymp <- mu + qnorm(0.95)*sigma
  c10_asymp <- mu + qnorm(0.90)*sigma
  typ1_01_exact[i] <- 1 - pwilcox(c01_asymp-1,m=m,n=n)
  typ1_05_exact[i] <- 1 - pwilcox(c05_asymp-1,m=m,n=n)
  typ1_10_exact[i] <- 1 - pwilcox(c10_asymp-1,m=m,n=n)
  
  c01_asymp_corr <- mu + 1/2 + qnorm(0.99)*sigma
  c05_asymp_corr <- mu + 1/2 + qnorm(0.95)*sigma
  c10_asymp_corr <- mu + 1/2 + qnorm(0.90)*sigma
  typ1_corr_01_exact[i] <- 1 - pwilcox(c01_asymp_corr-1,m=m,n=n)
  typ1_corr_05_exact[i] <- 1 - pwilcox(c05_asymp_corr-1,m=m,n=n)
  typ1_corr_10_exact[i] <- 1 - pwilcox(c10_asymp_corr-1,m=m,n=n)

}

plot(NA,
     xlim = range(NN),
     ylim = range(typ1_01_exact, typ1_05_exact,typ1_10_exact),
     ylab = "Type I error rate",
     xlab = "N")

lines(typ1_01_exact ~ NN)
lines(typ1_corr_01_exact ~ NN, lty = 2)
abline(h = 0.01,lty = 3)

lines(typ1_05_exact ~ NN, col = "blue")
lines(typ1_corr_05_exact ~ NN, col ="blue",lty = 2)
abline(h = 0.05,lty = 3, col = "blue")

lines(typ1_10_exact ~ NN,col = "red")
lines(typ1_corr_10_exact ~ NN, col = "red",lty = 2)
abline(h = 0.10,lty = 3, col ="red")

legend(x = 30,
       y = 0.27, 
       legend = c("0.10","0.10 with correction", "0.05","0.05 with correction","0.01", "0.01 with correction"),
       col = c("red","red","blue","blue","black","black"),
       lty = c(1,2,1,2,1,2),
       bty = "n")

Curves tracing the Type I error rate over increasing $N$. — Figure 33.1: The exact Type I error rate of the rejection rule in Equation 33.6 over increasing $N$ with $m = \lfloor N / 3 \rfloor$ at each $\alpha = 0.01,0.05,0.10$. Dashed lines trace the Type I error rates when the continuity correction is used.