25  Bootstrap for linear regression

In this section we consider applying the bootstrap to estimate the sampling distributions of least-squares estimators in the linear regression model. We first introduce the model and define some notation:

Definition 25.1 (Linear regression model) Let \((\mathbf{x}_1,Y_1),\dots,(\mathbf{x}_n,Y_n)\) be data pairs such that \[ Y_i = \mathbf{x}_i^T\boldsymbol{\beta}+ \varepsilon_i,\quad i = 1,\dots,n, \] where \(\mathbf{x}_1,\dots,\mathbf{x}_n \in \mathbb{R}^p\) are fixed (deterministic) with \(\sum_{i=1}^n \mathbf{x}_i \mathbf{x}_i^T\) positive definite and \(\varepsilon_1,\dots,\varepsilon_n\) independent and identically distributed with \(\mathbb{E}\varepsilon_1 = 0\), \(\mathbb{E}\varepsilon_1^2 = \sigma^2 \in(0,\infty)\), and \(\mathbb{E}\varepsilon_1^4 < \infty\).

Define the \(n \times p\) design matrix \(\mathbf{X}= [\mathbf{x}_1 \cdots \mathbf{x}_n]^T\) and as well as the \(n \times 1\) response vector \(\mathbf{Y}= (Y_1,\dots,Y_n)^T\). Then the least-squares estimator of \(\boldsymbol{\beta}\) is given by \[ \hat{\boldsymbol{\beta}}_n = \underset{\mathbf{t}\in \mathbb{R}^p}{\operatorname{argmin}} \sum_{i=1}^n(Y_i - \mathbf{x}_i^T\mathbf{t})^2 = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}. \] Letting \(\mathbf{c}\) be any vector in \(\mathbb{R}^p\), we consider estimating the contrast \(\mathbf{c}^T\boldsymbol{\beta}\) with the estimator \(\mathbf{c}^T\hat{\boldsymbol{\beta}}\). We note that \(\mathbb{E}\mathbf{c}^T\hat{\boldsymbol{\beta}} = \boldsymbol{\beta}\) and \[ \mathbb{V}(\mathbf{c}^T\hat{\boldsymbol{\beta}}_n) = \mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c}\sigma^2. \] With a view to constructing confidence intervals for the contrast \(\mathbf{c}^T\boldsymbol{\beta}\), we consider the asympotic distributions of the quantities \[\begin{align} Q_n &= \sqrt{n}\mathbf{c}^T(\hat{\boldsymbol{\beta}}_n - \boldsymbol{\beta})\quad \text{ and } \\ T_n &= \sqrt{n}\mathbf{c}^T(\hat{\boldsymbol{\beta}}_n - \boldsymbol{\beta}) / \hat \sigma_n, \end{align}\] where \(\hat \sigma_n^2\) given by \[ \hat \sigma_n^2 = \frac{1}{n-p}\sum_{i=1}^n(Y_i - \mathbf{x}_i^T\hat{\boldsymbol{\beta}}_n)^2. \] So \(T_n\) is a studentized version of \(Y_n\).

The Lindeberg central limit theorem gives the following results concerning the asymptotic distributions \(Y_n\) and \(T_n\) as \(n \to \infty\).

Theorem 25.1 (Asymptotic distributions of linear contrast pivots) Under the linear regression setup of Definition 25.1, let \(\Omega_{\mathbf{c},n} = \mathbf{c}^T(n^{-1}\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c}\). Then we have

  1. \(\Omega_{\mathbf{c},n}^{-1/2}Q_n \overset{\text{d}}{\longrightarrow}\mathcal{N}(0,\sigma^2)\)

  2. \(\Omega_{\mathbf{c},n}^{-1/2}T_n \overset{\text{d}}{\longrightarrow}\mathcal{N}(0,1)\)

as \(n \to \infty\), provided \[ \max_{1 \leq i \leq n}h_{ii} \to 0 \tag{25.1}\] as \(n \to \infty\), where \(h_{11},\dots,h_{nn}\) are the diagonal entries of the matrix \(\mathbf{H}= \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\).

We first make use of Corollary 29.1 to show that \[ Z_n = (\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c})^{-1/2}\mathbf{c}^T(\hat{\boldsymbol{\beta}}_n - \boldsymbol{\beta})/\sigma \] converges in distribution to a standard Normal random variable. We may write \[\begin{align} Z_n &=(\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c})^{-1/2}\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\varepsilon}/\sigma \\ &=\Big(\sum_{j=1}^n a_j^2\Big)^{-1/2} \sum_{i=1}^na_i (\varepsilon_i / \sigma), \end{align}\] where \(a_i = \mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_i\) for \(i=1,\dots,n\). Now we have \[\begin{align} \Big(\sum_{j=1}^n a_j ^2\Big)^{-1/2} \max_{1 \leq i \leq n} |a_i| &= (\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c})^{-1/2}\max_{1 \leq i \leq n} |\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_i| \\ &\leq (\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c})^{-1/2}\max_{1 \leq i \leq n} \|(\mathbf{X}^T\mathbf{X})^{-1/2}\mathbf{c}\|\|(\mathbf{X}^T\mathbf{X})^{-1/2}\mathbf{x}_i\|\\ &= \max_{1 \leq i \leq n} \sqrt{\mathbf{x}_i^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_i}\\ &= \max_{1 \leq i \leq n} \sqrt{h_{ii}}\\ & \to 0 \end{align}\] as \(n \to \infty\) by the assumption in Equation 25.1, where the inequality comes from Cauchy-Schwarz. So by Corollary 29.1 we have \[ Z_n \overset{\text{d}}{\longrightarrow}\mathcal{N}(0,1) \] as \(n \to \infty\).

From here, the first result holds by \(\Omega_{\mathbf{c},n}^{-1/2}Q_n = Z_n \sigma\).

For the second result, we have \(\Omega_{\mathbf{c},n}^{-1/2}T_n = Z_n(\sigma/\hat \sigma_n)\), where \(\hat \sigma_n \overset{\text{p}}{\longrightarrow}\sigma\) (I need to type up the proof of this. It is guaranteed by the finiteness of the fourth moment of the error term), so the second result follows from Slutzky’s Theorem.

The above result suggests that an asymptotic \((1-\alpha)100\%\) confidence interval for \(\mathbf{c}^T\boldsymbol{\beta}\) can be constructed as \[ \mathbf{c}^T\hat{\boldsymbol{\beta}}_n \pm z_{\alpha/2} n^{-1/2}\hat \sigma_n \Omega_{\mathbf{c},n}^{1/2} \tag{25.2}\] based on the asymptotic distribution of \(\Omega_{\mathbf{c},n}^{-1/2} T_n\) as \(n \to \infty\).

Next we describe how to construct bootstrap versions of the quantities \(Q_n\) and \(T_n\) using the residual bootstrap.

Definition 25.2 (Residual bootstrap for linear regression) Conditional on the residuals \(\hat \varepsilon_i = Y_i - \mathbf{x}_i^T\hat{\boldsymbol{\beta}}_n\), \(i=1,\dots,n\), introduce independent random variables \(\varepsilon_1^*,\dots,\varepsilon_n^*\) identically distributed according to the empirical distribution of \(\hat \varepsilon_1,\dots,\hat \varepsilon_n\). Then set \(Y_i^* = \mathbf{x}_i^T\hat{\boldsymbol{\beta}}_n + \varepsilon^*_i\) for \(i=1,\dots,n\) and let \[ \hat{\boldsymbol{\beta}}^*_n = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}^*, \] where \(\mathbf{Y}^*=(Y_1^*,\dots,Y_n^*)^T\). Now define bootstrap versions of \(Q_n\) and \(T_n\) as \[\begin{align} Q_n^* &= \sqrt{n}\mathbf{c}^T(\hat{\boldsymbol{\beta}}^*_n - \hat{\boldsymbol{\beta}}_n) \quad \text{ and } \\ T_n^* &= \sqrt{n}\mathbf{c}^T(\hat{\boldsymbol{\beta}}^*_n - \hat{\boldsymbol{\beta}}_n) / \hat \sigma^*_n, \end{align}\] respectively, where \[ (\hat \sigma_n^*)^2 = \frac{1}{n-p}\sum_{i=1}^n(Y_i^* - \mathbf{x}_i^T\hat{\boldsymbol{\beta}}_n^*)^2. \]

The next result claims that the residual bootstrap works for estimating the sampling distributions of \(Q_n\) and \(T_n\).

Theorem 25.2 (Residual bootstrap works) In the linear model in Definition 25.1 under the conditions of Theorem 25.1 we have

  1. \(\sup_{x \in \mathbb{R}}\Big|\mathbb{P}_*(Q_n^* \leq x) - \mathbb{P}(Q_n \leq x)\Big| \overset{\text{p}}{\longrightarrow}0\)

  2. \(\sup_{x \in \mathbb{R}}\Big|\mathbb{P}_*(T_n^* \leq x) - \mathbb{P}(T_n \leq x)\Big| \overset{\text{p}}{\longrightarrow}0\)

as \(n \to \infty\) with \(Q_n^*\) and \(T_n^*\) as defined in Definition 25.2.

I still need to type up a proof of this!

One can show that the residual bootstrap for estimating the distribution of \(T_n\) is second-order correct, but I do not present this as a formal result.

The above result suggests that asymptotic \((1-\alpha)100\%\) confidence intervals for \(\mathbf{c}^T\boldsymbol{\beta}\) can be constructed as follows based on the bootstrap estimates of the sampling distributions of \(Q_n\) and \(T_n\): Given (sorted) Monte Carlo realizations \(Q^{*(1)}\leq \dots \leq Q^{*(B)}_n\) of \(Q_n^*\) and \(T^{*(1)}\leq \dots \leq T^{*(B)}_n\) of \(T_n^*\) for some large \(B\), \((1-\alpha)100\%\) bootstrap confidence intervals for \(\mathbf{c}^T\boldsymbol{\beta}\) based on \(Q_n\) and \(T_n\) can be constructed as \[\begin{align} &\big[\mathbf{c}^T\hat{\boldsymbol{\beta}}_n - Q_n^{*(\lceil (\alpha/2) B\rceil)} n^{-1/2},~ \mathbf{c}^T\hat{\boldsymbol{\beta}}_n - Q_n^{*(\lceil (1-\alpha/2) B\rceil)} n^{-1/2}\big] \quad \text{ and }\\ &\big[\mathbf{c}^T\hat{\boldsymbol{\beta}}_n - T_n^{*(\lceil (\alpha/2) B\rceil)} n^{-1/2}\hat \sigma_n,~ \mathbf{c}^T\hat{\boldsymbol{\beta}}_n - T_n^{*(\lceil (1-\alpha/2) B\rceil)} n^{-1/2}\hat \sigma_n\big] \end{align}\] respectively.