24  Bootstrap for statistical functionals

This section draws from Fernholz (2012) as well as from Wasserman (2006).

We consider using the bootstrap to estimate the sampling distribution of an unstudentized pivot \[ Y_n = \sqrt{n}(T(\hat F_n) - T(F)), \tag{24.1}\] based on which we would like to make inferences on \(T(F)\). If the distribution, say \(G_{Y_n}\) of \(Y_n\) were known, then a \((1-\alpha)100\%\) confidence interval for \(T(F)\) could be constructed as \[ \big[T(\hat F_n) - G_{Y_n}^{-1}(\alpha/2)/\sqrt{n},T(\hat F_n) - G_{Y_n}^{-1}(1-\alpha/2)/\sqrt{n}\big]. \] We consider replacing the unknown quantiles \(G_{Y_n}^{-1}(\alpha/2)\) and \(G_{Y_n}^{-1}(1 - \alpha/2)\) of the distribution of \(Y_n\) with bootstrap estimates.

Definition 24.1 (Bootstrap for a statistical functional) Conditional on \(X_1,\dots,X_n\), introduce random variables \(X_1^*,\dots,X_n^*\) such that \(X_1^*,\dots,X_n^*|X_1,\dots,X_n \overset{\text{ind}}{\sim}\hat F_n\), where \(\hat F_n\) is the empirical distribution of \(X_1,\dots,X_n\), and define the bootstrap version \(Y_n^*\) of \(Y_n = \sqrt{n}(T(\hat F_n) - T(F))\) as

\[ Y_n^* \equiv \sqrt{n}(T(\hat F_n^*) - T(\hat F_n)), \] where \(\hat F^*_n\) is the empirical distribution of \(X_1^*,\dots,X_n^*\). Then the bootstrap estimator of \(G_{Y_n}\) is given by \[ \hat G_{Y_n}(x) = \mathbb{P}_*(Y_n^* \leq x) \] for all \(x \in \mathbb{R}\).

In order to establish that the bootstrap works for a given statistical functional–that is that we can use the bootstrap to estimate its sampling distribution–we find it is sufficient to show that the functional has a property called Hadamard differentiability.

Recall the Taylor-like expansion of a statistical functional \(T:\mathcal{D}\to\mathbb{R}\) at the distribution \(F\) evaluated at \(\hat F_n\), which allowed us to write \[ \sqrt{n}(T(\hat F_n) - T(F)) = \sqrt{n}T^{(1)}_F(\hat F_n - F) + \sqrt{n}R_F(\hat F_n - F), \tag{24.2}\] where \(T^{(1)}_F(\hat F_n - F)\) was the von Mises derivative of \(T\) at \(F\) in the direction of \(\hat F_n\). If the functional \(T\) satisfies Hadamard differentiability, which we define shortly, the second term on the right hand side of Equation 24.2 will vanish and the first term will converge to a Normal limit; moreover, the bootstrap in Definition 24.1 will work.

Before presenting these results, we give the definition of Hadamard differentiability.

Definition 24.2 (Hadamard differentiability of a statistical functional) A functional \(T:\mathcal{D}\to \mathbb{R}\) is Hadamard differentiable at \(F \in \mathcal{D}\) in the direction of \(G\in\mathcal{D}\) if there exists a linear function \(T^{(1)}_F:\mathcal{D}\to \mathbb{R}\) such that \[ \lim_{n\to \infty} \Big|\frac{T(F + \varepsilon_n(G_n - F)) - T(F)}{\varepsilon_n} - T_F^{(1)}(G - F)\Big| = 0 \] for every sequence \(G_n\in \mathcal{D}\) such that \(\sup_{x \in \mathbb{R}}|G_n(x) - G(x)| \to 0\) as \(n \to \infty\) and every sequence \(\varepsilon_n \downarrow 0\) as \(n \to \infty\).

Now we give a central limit theorem for statistical functionals:

Theorem 24.1 (A central limit theorem for Hadamard differentiable functionals) Let \(T\) be a Hadamard differentiable function with influence curve \(\varphi_F\) such that \(\sigma_T^2 = \int \varphi^2_F(x) dF(x)< \infty\). Then we have

  1. \(\sqrt{n}(T(\hat F_n) - T(F)) \overset{\text{d}}{\longrightarrow}\mathcal{N}(0,\sigma_T^2)\)
  2. \(\sqrt{n}(T(\hat F_n) - T(F))/\hat \sigma_T^2 \overset{\text{d}}{\longrightarrow}\mathcal{N}(0,1)\),

as \(n \to \infty\), where \(\hat \sigma_T^2 = \int \varphi^2_{\hat F_n}(x) d\hat F_n(x)\).

We have not yet discussed estimation of the variance \(\sigma_T^2\). The estimator \(\hat \sigma_T^2\) defined in Theorem 24.1 can be computed as the sample mean of what is called the empirical influence curve: The empirical influence curve \(\varphi_{\hat F_n}(x)\) is given by \[ \varphi_{\hat F_n}(x) = T_{\hat F_n}^{(1)}(\delta_x - \hat F_n) = \frac{d}{d\epsilon} T(\hat F_n + \epsilon(\delta_x - \hat F_n))\Big|_{\epsilon = 0}, \tag{24.3}\] with which we may compute \(\hat \sigma_T^2\) as \[ \hat \sigma_T^2 = \frac{1}{n}\sum_{i=1}^n \varphi_{\hat F_n}^2(X_i). \tag{24.4}\]

Table 24.1: Empirical influence functions and plug-in estimators for \(\sigma_T^2\) of some Hadamard-differentiable statistical functionals
Functional \(\theta\) \(\varphi_{\hat F_n}(x)\) \(\hat \sigma_T^2\)
Mean \(\mu\) \(x - \bar X_n\) \(\hat \sigma_n^2\)
Variance \(\sigma^2\) \((x - \bar X_n)^2 - \hat \sigma_n^2\) \(\hat \mu_{n4} - \hat \sigma_n^4\)
Probability \(p_A\) \(\mathbf{1}(x \in A) - \hat p_A\) \(\hat p_A(1-\hat p_A)\)
Smooth function of mean \(g(\mu)\) \(g'(\bar X_n)(x - \bar X_n)\) \([g'(\bar X_n)]^2\hat \sigma_n^2\)

For each functional we compute the empirical influence curve \(\varphi_{\hat F_n}\) and the variance estimator \(\hat \sigma^2_T\) according to Equation 24.3 and Equation 24.4, respectively.

The mean

For the mean functional \(T(F) = \int x dF(x)\) we have \[\begin{align} \varphi_{\hat F_n}(x) &= \frac{d}{d\epsilon} \int x d(\hat F_n + \epsilon(\delta_x - \hat F_n))(x)\Big|_{\epsilon = 0} \\ &= \frac{d}{d\epsilon} (\bar X_n + \epsilon(x - \bar X_n))\Big|_{\epsilon = 0} \\ &= x - \bar X_n. \end{align}\] From here we obtain \[ \hat \sigma_T^2 = \frac{1}{n}\sum_{i=1}^n(X_i - \bar X_n )^2 = \hat \sigma_n^2. \]

The variance

For the functional \(T(F) = \int (x - \int t dF(t))^2 dF(x)\) we have \[\begin{align} T(\hat F_n + \epsilon(\delta_x - \hat F_n)) &= \int\Big(s - \int t d(\hat F_n + \epsilon(\delta_x - \hat F_n))(t)\Big)^2d(\hat F_n + \epsilon(\delta_x - \hat F_n))(s)\\ &= \int\Big((s - \bar X_n) - \epsilon(x - \bar X_n)\Big)^2 d(F + \epsilon(\delta_x - \hat F_n))(s)\\ &= \int \Big((s - \bar X_n)^2 - 2 \epsilon(s - \bar X_n)(x-\bar X_n) + \epsilon^2(x - \bar X_n)^2\Big) d(\hat F_n + \epsilon(\delta_x - \hat F_n))(s)\\ &=(1-\epsilon)\hat \sigma_n^2 + \epsilon(x - \bar X_n)^2 - \epsilon^2(x - \bar X_n)^2, \end{align}\] so that \[ \varphi_{\hat F_n}(x) = \frac{d}{d\epsilon}T(\hat F_n + \epsilon(\delta_x - \hat F_n))\Big|_{\epsilon = 0} = (x - \bar X_n)^2 - \hat \sigma_n^2. \] Then we have \[\begin{align} \hat \sigma_T^2 &= \frac{1}{n}\sum_{i=1}^n [(X_i - \bar X_n)^2 - \hat \sigma_n^2]^2 \\ &= \frac{1}{n}\sum_{i=1}^n (X_i - \bar X_n)^4 - 2 \frac{1}{n}\sum_{i=1}^n (X_i - \bar X_n)^2\hat \sigma_n^2 + \hat \sigma_n^4 \\ &= \hat \mu_{n4} - \hat \sigma_n^4. \end{align}\]

A probability

For the functional \(T(F) = \int_AdF(x)\) we have \[\begin{align} T(\hat F_n + \epsilon(\delta_x - \hat F_n)) &= \int_Ad(\hat F_n + \epsilon(\delta_x - \hat F_n))(t)\\ &=(1-\epsilon)\int_A d \hat F_n(t) + \epsilon \int_A d\delta_x(t) \\ &=(1-\epsilon)\hat p_A + \epsilon \mathbf{1}(x \in A), \end{align}\] so \[ \varphi_F(x) = \frac{d}{d\epsilon}T(\hat F_n + \epsilon(\delta_x - \hat F_n))\Big|_{\epsilon = 0} = \mathbf{1}(x \in A) - \hat p_A. \] Then \[ \sigma^2_T = \int (\mathbf{1}(x \in A) - \hat p_A)^2 d\hat F_n(x) = \hat p_A(1-\hat p_A). \]

Smooth function of mean

For the functional \(T(F) = g(\int xdF(x))\) we have \[\begin{align} \varphi_{\hat F_n}(x) &= \frac{d}{d\epsilon} g\Big(\int x d(\hat F_n + \epsilon(\delta_x - \hat F_n))(x)\Big)\Big|_{\epsilon = 0} \\ &= \frac{d}{d\epsilon} g(\bar X_n + \epsilon(x - \bar X_n))\Big|_{\epsilon = 0} \\ &= g(\bar X_n + \epsilon(x - \bar X_n))(x - \bar X_n) \Big|_{\epsilon = 0}\\ &= g'(\bar X_n)(x - \bar X_n). \end{align}\] From here we obtain \[ \hat \sigma^2_T = \frac{1}{n}\sum_{i=1}^n[g'(\bar X_n)(X_i - \bar X_n)]^2 = [g'(\bar X_n)]^2\hat \sigma_n^2. \]

Note that the second result in Theorem 24.1 gives that the interval \[ T(\hat F_n) \pm z_{\alpha/2} \hat \sigma^2_T /\sqrt{n} \] will contain the target \(T(F)\) with probability approaching \(1-\alpha\) as \(n \to \infty\).

The follow result is adapted from Theorem 3.21 on page 35 of Wasserman (2006):

Theorem 24.2 (Bootstrap works for Hadamard differentiable functionals) If \(T\) is Hadamard differentiable and \(\sigma_T^2 = \int \varphi_F^2(x)dF(x) < \infty\), then \[ \sup_{x \in \mathbb{R}} \Big| \mathbb{P}_*(Y_n^* \leq x) - \mathbb{P}(Y_n \leq x)\Big| \overset{\text{p}}{\longrightarrow}0, \] as \(n \to \infty\), where \(Y_n\) is as in Equation 24.1 and \(Y_n^*\) is as in Definition 24.1.

Just in the case of the sample mean, it is too computationally expensive to compute the bootstrap estimate \(\hat G_{Y_n}(x) = \mathbb{P}_*(Y_n^* \leq x)\) of \(G_{Y_n}(x)\) exactly. Instead we obtain a Monte Carlo approximation as follows:

Definition 24.3 (Monte Carlo approximation to bootstrap estimator \(\hat G_{Y_n}\)) Choose a large \(B\). Then for \(b=1,\dots,B\) do:

  • Draw \(X_1^{*(b)},\dots,X_n^{*(b)}\) with replacement from \(X_1,\dots,X_n\).
  • Compute \(Y_n^{*(b)} =\sqrt{n}(T(\hat F_n^{*(b)}) - T(\hat F_n))\), where \(\hat F_n^{*(b)} = n^{-1}\sum_{i=1}^n \delta_{X_i^{*(b)}}\).

Then set \(\hat G_{Y_n}(x) = B^{-1}\sum_{b=1}^B \mathbf{1}(Y_n^{*(b)} \leq x)\) for all \(x \in \mathbb{R}\).

The Monte Carlo approximation to the bootstrap confidence interval \[ \big[T(\hat F_n) - \hat G_{Y_n}^{-1}(\alpha/2)/\sqrt{n},T(\hat F_n) - \hat G_{Y_n}^{-1}(1-\alpha/2)/\sqrt{n}\big] \] may be obtained as \[ \big[2T(\hat F_n) - T^{*(\lceil (\alpha/2)B\rceil)},2T(\hat F_n) - T^{*(\lceil (1-\alpha/2)B\rceil)}\big], \tag{24.5}\] where \(T^{*(1)} \leq \dots \leq T^{*(B)}\) are values of \(T(\hat F^{*(1)}_n),\dots,T(\hat F^{*(1)}_n)\) sorted in increasing order.

Example 24.1 (Bootstrap for the trimmed mean–not Hadamard) Consider the coverage of the \((1-\alpha)100\%\) bootstrap confidence interval in Equation 24.5 for the \(\xi\)-trimmed mean when \(X_1,\dots,X_n\) are independent realizations of the random variable \[ X = D(G - ab) + (1-D)\upsilon |T| \] where \(D\), \(G\), and \(T\) are independent random variables such that \(D\) is a Bernoulli random variable with \(P(D = 1) = \delta\), \(G\) has the Gamma distribution with mean \(ab\) and variance \(ab^2\), and \(T\) has the \(t\) distribution with degrees of freedom 2. The simulation is run with \(a = 1/2\), \(b = 6\), \(\delta = 0.8\), \(\upsilon = 10\), \(\xi = 0.10\), \(B = 500\), and \(\alpha = 0.05\) at increasing sample sizes \(n\).

Code
a <- 2
b <- 3
delta <- 0.9
v <- 2
u <- 10
xi <- 0.10
alpha <- 0.05

trimmed <- function(X,xi){
  
  Xsrt <- sort(X)
  n <- length(X)
  i1 <- ceiling(xi*n)
  i2 <- ceiling((1-xi)*n)
  val <- mean(Xsrt[i1:i2])
  
  return(val)
  
}

# get MC approximation to population xi-trimmed mean
N <- 1000000
G <- rgamma(N,shape = a,scale = b)
Tv <- rt(N,v)
D <- sample(0:1,N,replace=TRUE,prob=c(1-delta,delta))
X <- D * (G - a*b) + (1-D) * u * abs(Tv)

mu <- trimmed(X,xi)

# run simulation
nn <- c(20,40,80,160,320)
S <- 500
M <- 500
covn <- numeric(length(nn))
for(i in 1:length(nn)){

  n <- nn[i]
  cov <- numeric(S)
  mu_hat <- numeric(S)
  for(s in 1:S){

    G <- rgamma(n,shape = a,scale = b)
    Tv <- rt(n,v)
    D <- sample(0:1,n,replace=TRUE,prob=c(1-delta,delta))
    X <- D*(G - a*b) + (1-D) * u * abs(Tv)

    mu_hat[s] <- trimmed(X,xi)
    
    mu_hat_boot <- numeric(M)
    for(m in 1:M){ # M Monte Carlo samples

      Xboot <- sort(X[sample(1:n,n,replace=TRUE)])
      mu_hat_boot[m] <- trimmed(Xboot,xi)  
    
    }

    mu_hat_boot <- sort(mu_hat_boot)
    
    lo <- 2*mu_hat[s] - mu_hat_boot[ceiling((1-alpha/2)*M)]
    up <- 2*mu_hat[s] - mu_hat_boot[ceiling((alpha/2)*M)]

    cov[s] <- (lo < mu ) & (up > mu)
      
  }

  covn[i] <- mean(cov)

}

library(knitr)
Warning: package 'knitr' was built under R version 4.4.1
Code
tab <- cbind(nn,covn)
colnames(tab) <- c("n","coverage")
kable(tab,digits = c(0,3))
n coverage
20 0.902
40 0.922
80 0.926
160 0.926
320 0.934