This section draws from Fernholz (2012) as well as from Wasserman (2006).
We consider using the bootstrap to estimate the sampling distribution of an unstudentized pivot \[
Y_n = \sqrt{n}(T(\hat F_n) - T(F)),
\tag{24.1}\] based on which we would like to make inferences on \(T(F)\). If the distribution, say \(G_{Y_n}\) of \(Y_n\) were known, then a \((1-\alpha)100\%\) confidence interval for \(T(F)\) could be constructed as \[
\big[T(\hat F_n) - G_{Y_n}^{-1}(\alpha/2)/\sqrt{n},T(\hat F_n) - G_{Y_n}^{-1}(1-\alpha/2)/\sqrt{n}\big].
\] We consider replacing the unknown quantiles \(G_{Y_n}^{-1}(\alpha/2)\) and \(G_{Y_n}^{-1}(1 - \alpha/2)\) of the distribution of \(Y_n\) with bootstrap estimates.
Definition 24.1 (Bootstrap for a statistical functional) Conditional on \(X_1,\dots,X_n\), introduce random variables \(X_1^*,\dots,X_n^*\) such that \(X_1^*,\dots,X_n^*|X_1,\dots,X_n \overset{\text{ind}}{\sim}\hat F_n\), where \(\hat F_n\) is the empirical distribution of \(X_1,\dots,X_n\), and define the bootstrap version \(Y_n^*\) of \(Y_n = \sqrt{n}(T(\hat F_n) - T(F))\) as
\[
Y_n^* \equiv \sqrt{n}(T(\hat F_n^*) - T(\hat F_n)),
\] where \(\hat F^*_n\) is the empirical distribution of \(X_1^*,\dots,X_n^*\). Then the bootstrap estimator of \(G_{Y_n}\) is given by \[
\hat G_{Y_n}(x) = \mathbb{P}_*(Y_n^* \leq x)
\] for all \(x \in \mathbb{R}\).
In order to establish that the bootstrap works for a given statistical functional–that is that we can use the bootstrap to estimate its sampling distribution–we find it is sufficient to show that the functional has a property called Hadamard differentiability.
Recall the Taylor-like expansion of a statistical functional \(T:\mathcal{D}\to\mathbb{R}\) at the distribution \(F\) evaluated at \(\hat F_n\), which allowed us to write \[
\sqrt{n}(T(\hat F_n) - T(F)) = \sqrt{n}T^{(1)}_F(\hat F_n - F) + \sqrt{n}R_F(\hat F_n - F),
\tag{24.2}\] where \(T^{(1)}_F(\hat F_n - F)\) was the von Mises derivative of \(T\) at \(F\) in the direction of \(\hat F_n\). If the functional \(T\) satisfies Hadamard differentiability, which we define shortly, the second term on the right hand side of Equation 24.2 will vanish and the first term will converge to a Normal limit; moreover, the bootstrap in Definition 24.1 will work.
Before presenting these results, we give the definition of Hadamard differentiability.
Definition 24.2 (Hadamard differentiability of a statistical functional) A functional \(T:\mathcal{D}\to \mathbb{R}\) is Hadamard differentiable at \(F \in \mathcal{D}\) in the direction of \(G\in\mathcal{D}\) if there exists a linear function \(T^{(1)}_F:\mathcal{D}\to \mathbb{R}\) such that \[
\lim_{n\to \infty} \Big|\frac{T(F + \varepsilon_n(G_n - F)) - T(F)}{\varepsilon_n} - T_F^{(1)}(G - F)\Big| = 0
\] for every sequence \(G_n\in \mathcal{D}\) such that \(\sup_{x \in \mathbb{R}}|G_n(x) - G(x)| \to 0\) as \(n \to \infty\) and every sequence \(\varepsilon_n \downarrow 0\) as \(n \to \infty\).
Now we give a central limit theorem for statistical functionals:
Theorem 24.1 (A central limit theorem for Hadamard differentiable functionals) Let \(T\) be a Hadamard differentiable function with influence curve \(\varphi_F\) such that \(\sigma_T^2 = \int \varphi^2_F(x) dF(x)< \infty\). Then we have
as \(n \to \infty\), where \(\hat \sigma_T^2 = \int \varphi^2_{\hat F_n}(x) d\hat F_n(x)\).
We have not yet discussed estimation of the variance \(\sigma_T^2\). The estimator \(\hat \sigma_T^2\) defined in Theorem 24.1 can be computed as the sample mean of what is called the empirical influence curve: The empirical influence curve \(\varphi_{\hat F_n}(x)\) is given by \[
\varphi_{\hat F_n}(x) = T_{\hat F_n}^{(1)}(\delta_x - \hat F_n) = \frac{d}{d\epsilon} T(\hat F_n + \epsilon(\delta_x - \hat F_n))\Big|_{\epsilon = 0},
\tag{24.3}\] with which we may compute \(\hat \sigma_T^2\) as \[
\hat \sigma_T^2 = \frac{1}{n}\sum_{i=1}^n \varphi_{\hat F_n}^2(X_i).
\tag{24.4}\]
Table 24.1: Empirical influence functions and plug-in estimators for \(\sigma_T^2\) of some Hadamard-differentiable statistical functionals
Functional
\(\theta\)
\(\varphi_{\hat F_n}(x)\)
\(\hat \sigma_T^2\)
Mean
\(\mu\)
\(x - \bar X_n\)
\(\hat \sigma_n^2\)
Variance
\(\sigma^2\)
\((x - \bar X_n)^2 - \hat \sigma_n^2\)
\(\hat \mu_{n4} - \hat \sigma_n^4\)
Probability
\(p_A\)
\(\mathbf{1}(x \in A) - \hat p_A\)
\(\hat p_A(1-\hat p_A)\)
Smooth function of mean
\(g(\mu)\)
\(g'(\bar X_n)(x - \bar X_n)\)
\([g'(\bar X_n)]^2\hat \sigma_n^2\)
Derivations of empirical influence functions and variance estimators in Table 24.1
For each functional we compute the empirical influence curve \(\varphi_{\hat F_n}\) and the variance estimator \(\hat \sigma^2_T\) according to Equation 24.3 and Equation 24.4, respectively.
The mean
For the mean functional \(T(F) = \int x dF(x)\) we have \[\begin{align}
\varphi_{\hat F_n}(x) &= \frac{d}{d\epsilon} \int x d(\hat F_n + \epsilon(\delta_x - \hat F_n))(x)\Big|_{\epsilon = 0} \\
&= \frac{d}{d\epsilon} (\bar X_n + \epsilon(x - \bar X_n))\Big|_{\epsilon = 0} \\
&= x - \bar X_n.
\end{align}\] From here we obtain \[
\hat \sigma_T^2 = \frac{1}{n}\sum_{i=1}^n(X_i - \bar X_n )^2 = \hat \sigma_n^2.
\]
For the functional \(T(F) = \int_AdF(x)\) we have \[\begin{align}
T(\hat F_n + \epsilon(\delta_x - \hat F_n)) &= \int_Ad(\hat F_n + \epsilon(\delta_x - \hat F_n))(t)\\
&=(1-\epsilon)\int_A d \hat F_n(t) + \epsilon \int_A d\delta_x(t) \\
&=(1-\epsilon)\hat p_A + \epsilon \mathbf{1}(x \in A),
\end{align}\] so \[
\varphi_F(x) = \frac{d}{d\epsilon}T(\hat F_n + \epsilon(\delta_x - \hat F_n))\Big|_{\epsilon = 0} = \mathbf{1}(x \in A) - \hat p_A.
\] Then \[
\sigma^2_T = \int (\mathbf{1}(x \in A) - \hat p_A)^2 d\hat F_n(x) = \hat p_A(1-\hat p_A).
\]
Smooth function of mean
For the functional \(T(F) = g(\int xdF(x))\) we have \[\begin{align}
\varphi_{\hat F_n}(x) &= \frac{d}{d\epsilon} g\Big(\int x d(\hat F_n + \epsilon(\delta_x - \hat F_n))(x)\Big)\Big|_{\epsilon = 0} \\
&= \frac{d}{d\epsilon} g(\bar X_n + \epsilon(x - \bar X_n))\Big|_{\epsilon = 0} \\
&= g(\bar X_n + \epsilon(x - \bar X_n))(x - \bar X_n) \Big|_{\epsilon = 0}\\
&= g'(\bar X_n)(x - \bar X_n).
\end{align}\] From here we obtain \[
\hat \sigma^2_T = \frac{1}{n}\sum_{i=1}^n[g'(\bar X_n)(X_i - \bar X_n)]^2 = [g'(\bar X_n)]^2\hat \sigma_n^2.
\]
Note that the second result in Theorem 24.1 gives that the interval \[
T(\hat F_n) \pm z_{\alpha/2} \hat \sigma^2_T /\sqrt{n}
\] will contain the target \(T(F)\) with probability approaching \(1-\alpha\) as \(n \to \infty\).
The follow result is adapted from Theorem 3.21 on page 35 of Wasserman (2006):
Theorem 24.2 (Bootstrap works for Hadamard differentiable functionals) If \(T\) is Hadamard differentiable and \(\sigma_T^2 = \int \varphi_F^2(x)dF(x) < \infty\), then \[
\sup_{x \in \mathbb{R}} \Big| \mathbb{P}_*(Y_n^* \leq x) - \mathbb{P}(Y_n \leq x)\Big| \overset{\text{p}}{\longrightarrow}0,
\] as \(n \to \infty\), where \(Y_n\) is as in Equation 24.1 and \(Y_n^*\) is as in Definition 24.1.
Just in the case of the sample mean, it is too computationally expensive to compute the bootstrap estimate \(\hat G_{Y_n}(x) = \mathbb{P}_*(Y_n^* \leq x)\) of \(G_{Y_n}(x)\) exactly. Instead we obtain a Monte Carlo approximation as follows:
Definition 24.3 (Monte Carlo approximation to bootstrap estimator \(\hat G_{Y_n}\)) Choose a large \(B\). Then for \(b=1,\dots,B\) do:
Draw \(X_1^{*(b)},\dots,X_n^{*(b)}\) with replacement from \(X_1,\dots,X_n\).
Then set \(\hat G_{Y_n}(x) = B^{-1}\sum_{b=1}^B \mathbf{1}(Y_n^{*(b)} \leq x)\) for all \(x \in \mathbb{R}\).
The Monte Carlo approximation to the bootstrap confidence interval \[
\big[T(\hat F_n) - \hat G_{Y_n}^{-1}(\alpha/2)/\sqrt{n},T(\hat F_n) - \hat G_{Y_n}^{-1}(1-\alpha/2)/\sqrt{n}\big]
\] may be obtained as \[
\big[2T(\hat F_n) - T^{*(\lceil (\alpha/2)B\rceil)},2T(\hat F_n) - T^{*(\lceil (1-\alpha/2)B\rceil)}\big],
\tag{24.5}\] where \(T^{*(1)} \leq \dots \leq T^{*(B)}\) are values of \(T(\hat F^{*(1)}_n),\dots,T(\hat F^{*(1)}_n)\) sorted in increasing order.
Example 24.1 (Bootstrap for the trimmed mean–not Hadamard) Consider the coverage of the \((1-\alpha)100\%\) bootstrap confidence interval in Equation 24.5 for the \(\xi\)-trimmed mean when \(X_1,\dots,X_n\) are independent realizations of the random variable \[
X = D(G - ab) + (1-D)\upsilon |T|
\] where \(D\), \(G\), and \(T\) are independent random variables such that \(D\) is a Bernoulli random variable with \(P(D = 1) = \delta\), \(G\) has the Gamma distribution with mean \(ab\) and variance \(ab^2\), and \(T\) has the \(t\) distribution with degrees of freedom 2. The simulation is run with \(a = 1/2\), \(b = 6\), \(\delta = 0.8\), \(\upsilon = 10\), \(\xi = 0.10\), \(B = 500\), and \(\alpha = 0.05\) at increasing sample sizes \(n\).
Code
a <-2b <-3delta <-0.9v <-2u <-10xi <-0.10alpha <-0.05trimmed <-function(X,xi){ Xsrt <-sort(X) n <-length(X) i1 <-ceiling(xi*n) i2 <-ceiling((1-xi)*n) val <-mean(Xsrt[i1:i2])return(val)}# get MC approximation to population xi-trimmed meanN <-1000000G <-rgamma(N,shape = a,scale = b)Tv <-rt(N,v)D <-sample(0:1,N,replace=TRUE,prob=c(1-delta,delta))X <- D * (G - a*b) + (1-D) * u *abs(Tv)mu <-trimmed(X,xi)# run simulationnn <-c(20,40,80,160,320)S <-500M <-500covn <-numeric(length(nn))for(i in1:length(nn)){ n <- nn[i] cov <-numeric(S) mu_hat <-numeric(S)for(s in1:S){ G <-rgamma(n,shape = a,scale = b) Tv <-rt(n,v) D <-sample(0:1,n,replace=TRUE,prob=c(1-delta,delta)) X <- D*(G - a*b) + (1-D) * u *abs(Tv) mu_hat[s] <-trimmed(X,xi) mu_hat_boot <-numeric(M)for(m in1:M){ # M Monte Carlo samples Xboot <-sort(X[sample(1:n,n,replace=TRUE)]) mu_hat_boot[m] <-trimmed(Xboot,xi) } mu_hat_boot <-sort(mu_hat_boot) lo <-2*mu_hat[s] - mu_hat_boot[ceiling((1-alpha/2)*M)] up <-2*mu_hat[s] - mu_hat_boot[ceiling((alpha/2)*M)] cov[s] <- (lo < mu ) & (up > mu) } covn[i] <-mean(cov)}library(knitr)
Warning: package 'knitr' was built under R version 4.4.1