16  Consistency of bootstrap for the sample mean

As before, let \(X_1,\dots,X_n\) be independent, identically distributed random variables with mean \(\mu\) and variance \(\sigma^2 < \infty\), and consider the quantity \[ Y_n \equiv \sqrt{n}(\bar X_n - \mu) \overset{\text{d}}{\longrightarrow}\mathcal{N}(0,\sigma^2) \] as \(n \to \infty\). Denote by \(G_{Y_n}\) the cdf of \(Y_n\), so that \[ G_{Y_n}(x) = \mathbb{P}(Y_n \leq x) \] for all \(x \in \mathbb{R}\), and recall from Definition 14.1 that the bootstrap estimator of \(G_{Y_n}(x)\) is given by \[ \hat G_{Y_n}(x) = \mathbb{P}(Y_n^* \leq x | X_1,\dots,X_n) \] for all \(x \in \mathbb{R}\), where \[ Y_n^* = \sqrt{n}(\bar X_n^* - \bar X_n). \] For studying the properties of the estimator \(\hat G_{Y_n}(x)\), it will be convenient to define some concise notation for probabilities concerning \(X_1^*,\dots,X_n^*\) as well as expectations and variances thereof conditional on the observed data \(X_1,\dots,X_n\).

Definition 16.1 (Bootstrap probability, expectation, and variance) Let \(\mathbb{P}_*\), \(\mathbb{E}_*\), and \(\mathbb{V}_*\) be operators such that \[\begin{align} \mathbb{P}_*(\cdot ) &= \mathbb{P}(\cdot | X_1,\dots,X_n)\\ \mathbb{E}_*(\cdot ) &= \mathbb{E}(\cdot | X_1,\dots,X_n)\\ \mathbb{V}_*(\cdot ) &= \mathbb{V}(\cdot | X_1,\dots,X_n), \end{align}\] so that they represent conditional probability, expectation, and variance given the observed data \(X_1,\dots,X_n\).

With this notation we can write the bootstrap estimator \(\hat G_{Y_n}(x)\) of \(G_{Y_n}(x)\) as \[ \hat G_{Y_n}(x) = \mathbb{P}_*(Y_n^* \leq x). \]

Now we make a couple of observations about the conditional distribution of \(X_1^*,\dots,X_n^*\) given \(X_1,\dots,X_n\).

Proposition 16.1 (Mean and variance of bootstrap random variables) In the setup of Definition 14.1 we have \[\begin{align} \mathbb{E}[X_1^* | X_1,\dots,X_n] &= \bar X_n\\ \mathbb{V}[X_1^*| X_1,\dots,X_n] &= \hat \sigma^2_n, \end{align}\] where \(\hat \sigma^2_n = S_n^2(n-1)/n\).

The random variable \(X_1^*\), conditional on \(X_1,\dots,X_n\), has probability mass function \[ p_n(x) = \frac{1}{n}\mathbf{1}(x \in \{X_1,\dots,X_n\}), \] according to which \[ \mathbb{E}[X_1^* | X_1,\dots,X_n] = \sum_{x \in \{X_1,\dots,X_n\}}x \cdot \frac{1}{n} = \frac{1}{n}\sum_{i=1}^n X_i = \bar X_n \] and \[ \mathbb{E}[(X_1^*)^2 | X_1,\dots,X_n] = \sum_{x \in \{X_1,\dots,X_n\}}x^2 \cdot \frac{1}{n} = \frac{1}{n}\sum_{i=1}^n X_i^2. \] Thus \[ \mathbb{V}[X_1^*| X_1,\dots,X_n] = \frac{1}{n}\sum_{i=1}^n X_i^2 - (\bar X_n)^2 = \frac{1}{n}\sum_{i=1}^n(X_i - \bar X_n)^2 = \hat \sigma_n^2. \]

We are now prepared to present a consistency result for the bootstrap estimator of the sampling distribution of \(Y_n\):

Theorem 16.1 (Consistency of the bootstrap for the mean) Let \(X_1,\dots,X_n\) be independent, identically distributed random variables with mean \(\mu\) and variance \(\sigma^2 \in (0,\infty)\) with \(\mathbb{E}|X_1|^3 < \infty\). Then for the bootstrap in Definition 14.1 we have \[ \sup_{x \in \mathbb{R}}|\mathbb{P}_*(Y_n^* \leq x) - \mathbb{P}(Y_n\leq x)| \overset{\text{a.s.}}{\longrightarrow}0 \] as \(n \to \infty\).

The result states that the difference between the bootstrap estimator \(\hat G_{Y_n}(x)\) and its target \(G_{Y_n}(x)\) converges almost surely to zero as \(n \to \infty\) for all \(x \in \mathbb{R}\). In other words, the bootstrap works.

The key tool we will use in our proof of Theorem 16.1 is the Berry-Esseen Theroem.

Theorem 16.2 (Berry-Esseen Theorem) For \(X_1,\dots,X_n\) independent and identically distributed with \(\mathbb{E}X_1 = \mu\), \(\mathbb{V}X_1 = \sigma^2 \in (0,\infty)\), and with \(\mathbb{E}|X_1|^3 <\infty\), we have \[ \sup_{x \in \mathbb{R}}|P(\sqrt{n}(\bar X_n - \mu)/\sigma \leq x) - \Phi(x)| \leq C \frac{\mathbb{E}|X_1 - \mu|^3}{\sigma^3\sqrt{n}} \] for each \(n \geq 1\), where \(C \in [0,\sqrt{2/\pi}(5/2 + 12/\pi)]\).

Note that \(\Phi(\cdot)\) denotes the cdf of the \(\mathcal{N}(0,1)\) distribution.

Noting that \(\mathbb{P}(Y_n \leq x) = \mathbb{P}(\sqrt{n}(\bar X_n - \mu)/\sigma\leq x/\sigma)\), the Berry-Esseen Theorem gives \[ \sup_{x\in \mathbb{R}}|P(Y_n \leq x) - \Phi(x/\sigma)|\leq C \frac{\mathbb{E}|X_1-\mu|^3}{\sigma^3\sqrt{n}}, \] where the right hand side goes to zero as \(n \to \infty\). So it is sufficient to show \[ \sup_{x \in \mathbb{R}}|\mathbb{P}_*(Y_n^* \leq x) - \Phi(x/\sigma)| \overset{\text{a.s.}}{\longrightarrow}0 \] as \(n \to \infty\), that is, that \(Y_n^*\) and \(Y_n\) have the same limiting Normal distribution. For each \(x\) we may write \[ |\mathbb{P}_*(Y_n^* \leq x) - \Phi(x/\sigma)| \leq |\Delta_{n1}(x)| +|\Delta_{n2}(x)|, \] where \[ \Delta_{n1}(x) = \mathbb{P}_*(Y_n^* \leq x) - \Phi(x/\hat \sigma_n) \] and \[ \Delta_{n2}(x) = \Phi(x/\hat \sigma_n) - \Phi(x/\sigma). \] By the Kolmogorov SLLN and the continuous mapping theorem we have \(\hat \sigma_n \overset{\text{a.s.}}{\longrightarrow}\sigma\) almost surely, so that \(\sup_{x\in \mathbb{R}}|\Delta_{n2}(x) | \to 0\) almost surely. By the Berry–Esseen theorem, we can bound the first term by \[\begin{align} \sup_{x \in \mathbb{R}} |\Delta_{n1}(x)| &= \sup_{x \in \mathbb{R}}|\mathbb{P}_*(\sqrt{n}(\bar X_n^* - \bar X_n)/\hat \sigma_n \leq x/\hat \sigma_n) - \Phi(x/\hat \sigma_n)| \\ &\leq C \frac{\mathbb{E}_*|X_1^* - \bar X_n|^3}{\hat \sigma_n^3\sqrt{n}}. \end{align}\] Furthermore, by Proposition 32.1 (Minkowski’s inequality) we have \[\begin{align} (\mathbb{E}_*|X_1^* - \bar X_n|^3)^{1/3} &\leq (\mathbb{E}_*|X_1^*|^3)^{1/3} + (|\bar X_n|^3)^{1/3} \\ & \leq 2 (\mathbb{E}_*|X_1^*|^3)^{1/3}, \end{align}\] where the second inequality comes from the application of Proposition 32.2 (Jensen’s inequality) \[ |\bar X_n|^3 = |\mathbb{E}X_1^*|^3 \leq \mathbb{E}| X_1^*|^3, \] which implies \(|\bar X_n| \leq (\mathbb{E}_*|X_1^*|^3)^{1/3}\). Noting that \(\mathbb{E}_*|X_1^*|^3\) is equal to \(n^{-1}\sum_{i=1}^n |X_i|^3\), we can replace the Berry-Esseen bound by \[ \sup_{x \in \mathbb{R}} |\Delta_{n1}(x)| \leq C \frac{2^3}{\hat \sigma_n^2 \sqrt{n}} \frac{1}{n}\sum_{i=1}^n |X_i|^3. \] Now, by the Kolmogorov SLLN we have \(n^{-1}\sum_{i=1}^n |X_i|^3 \to \mathbb{E}|X_1| < \infty\) almost surely as \(n \to \infty\). Since \(\hat \sigma_n \to \sigma\) almost surely the result is proved.

A similar proof is given on page 535 of Athreya and Lahiri (2006).