38 Bootstrap for linear regression

In this section we consider applying the bootstrap to estimate the sampling distributions of least-squares estimators in the linear regression model. We first introduce the model and define some notation:

Definition 38.1 (Linear regression model) Let \((\mathbf{x}_1,Y_1),\dots,(\mathbf{x}_n,Y_n)\) be data pairs such that \[ Y_i = \mathbf{x}_i^T\boldsymbol{\beta}+ \varepsilon_i,\quad i = 1,\dots,n, \] where \(\mathbf{x}_1,\dots,\mathbf{x}_n \in \mathbb{R}^p\) are fixed (deterministic) with \(\sum_{i=1}^n \mathbf{x}_i \mathbf{x}_i^T\) positive definite and \(\varepsilon_1,\dots,\varepsilon_n\) independent and identically distributed with \(\mathbb{E}\varepsilon_1 = 0\), \(\mathbb{E}\varepsilon_1^2 = \sigma^2 \in(0,\infty)\), and \(\mathbb{E}\varepsilon_1^4 < \infty\).

Define the \(n \times p\) design matrix \(\mathbf{X}= [\mathbf{x}_1 \cdots \mathbf{x}_n]^T\) and as well as the \(n \times 1\) response vector \(\mathbf{Y}= (Y_1,\dots,Y_n)^T\). Then the least-squares estimator of \(\boldsymbol{\beta}\) is given by \[ \hat{\boldsymbol{\beta}}_n = \underset{\mathbf{t}\in \mathbb{R}^p}{\operatorname{argmin}} \sum_{i=1}^n(Y_i - \mathbf{x}_i^T\mathbf{t})^2 = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}. \] Letting \(\mathbf{c}\) be any vector in \(\mathbb{R}^p\), we consider estimating the contrast \(\mathbf{c}^T\boldsymbol{\beta}\) with the estimator \(\mathbf{c}^T\hat{\boldsymbol{\beta}}\). We note that \(\mathbb{E}\mathbf{c}^T\hat{\boldsymbol{\beta}} = \boldsymbol{\beta}\) and \[ \mathbb{V}(\mathbf{c}^T\hat{\boldsymbol{\beta}}_n) = \mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c}\sigma^2. \] With a view to constructing confidence intervals for the contrast \(\mathbf{c}^T\boldsymbol{\beta}\), we consider the asympotic distributions of the quantities \[\begin{align} Q_n &= \sqrt{n}\mathbf{c}^T(\hat{\boldsymbol{\beta}}_n - \boldsymbol{\beta})\quad \text{ and } \\ T_n &= \sqrt{n}\mathbf{c}^T(\hat{\boldsymbol{\beta}}_n - \boldsymbol{\beta}) / \hat \sigma_n, \end{align}\] where \(\hat \sigma_n^2\) given by \[ \hat \sigma_n^2 = \frac{1}{n-p}\sum_{i=1}^n(Y_i - \mathbf{x}_i^T\hat{\boldsymbol{\beta}}_n)^2. \] So \(T_n\) is a studentized version of \(Y_n\).

The Lindeberg central limit theorem gives the following results concerning the asymptotic distributions \(Y_n\) and \(T_n\) as \(n \to \infty\).

Theorem 38.1 (Asymptotic distributions of linear contrast pivots) Under the linear regression setup of Definition 38.1, let \(\Omega_{\mathbf{c},n} = \mathbf{c}^T(n^{-1}\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c}\). Then we have

\(\Omega_{\mathbf{c},n}^{-1/2}Q_n \stackrel{d}{\rightarrow}\mathcal{N}(0,\sigma^2)\)
\(\Omega_{\mathbf{c},n}^{-1/2}T_n \stackrel{d}{\rightarrow}\mathcal{N}(0,1)\)

as \(n \to \infty\), provided \[ \max_{1 \leq i \leq n}h_{ii} \to 0 \tag{38.1}\] as \(n \to \infty\), where \(h_{11},\dots,h_{nn}\) are the diagonal entries of the matrix \(\mathbf{H}= \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\).

Proof of Theorem 38.1

We first make use of Corollary 45.1 to show that \[ Z_n = (\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c})^{-1/2}\mathbf{c}^T(\hat{\boldsymbol{\beta}}_n - \boldsymbol{\beta})/\sigma \] converges in distribution to a standard Normal random variable. We may write \[\begin{align} Z_n &=(\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c})^{-1/2}\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\varepsilon}/\sigma \\ &=\Big(\sum_{j=1}^n a_j^2\Big)^{-1/2} \sum_{i=1}^na_i (\varepsilon_i / \sigma), \end{align}\] where \(a_i = \mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_i\) for \(i=1,\dots,n\). Now we have \[\begin{align} \Big(\sum_{j=1}^n a_j ^2\Big)^{-1/2} \max_{1 \leq i \leq n} |a_i| &= (\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c})^{-1/2}\max_{1 \leq i \leq n} |\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_i| \\ &\leq (\mathbf{c}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{c})^{-1/2}\max_{1 \leq i \leq n} \|(\mathbf{X}^T\mathbf{X})^{-1/2}\mathbf{c}\|\|(\mathbf{X}^T\mathbf{X})^{-1/2}\mathbf{x}_i\|\\ &= \max_{1 \leq i \leq n} \sqrt{\mathbf{x}_i^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_i}\\ &= \max_{1 \leq i \leq n} \sqrt{h_{ii}}\\ & \to 0 \end{align}\] as \(n \to \infty\) by the assumption in Equation 38.1, where the inequality comes from Cauchy-Schwarz. So by Corollary 45.1 we have \[ Z_n \stackrel{d}{\rightarrow}\mathcal{N}(0,1) \] as \(n \to \infty\).

From here, the first result holds by \(\Omega_{\mathbf{c},n}^{-1/2}Q_n = Z_n \sigma\).

For the second result, we have \(\Omega_{\mathbf{c},n}^{-1/2}T_n = Z_n(\sigma/\hat \sigma_n)\), where \(\hat \sigma_n \stackrel{p}{\rightarrow}\sigma\) (I need to type up the proof of this. It is guaranteed by the finiteness of the fourth moment of the error term), so the second result follows from Slutzky’s Theorem.

The above result suggests that an asymptotic \((1-\alpha)100\%\) confidence interval for \(\mathbf{c}^T\boldsymbol{\beta}\) can be constructed as \[ \mathbf{c}^T\hat{\boldsymbol{\beta}}_n \pm z_{\alpha/2} n^{-1/2}\hat \sigma_n \Omega_{\mathbf{c},n}^{1/2} \tag{38.2}\] based on the asymptotic distribution of \(\Omega_{\mathbf{c},n}^{-1/2} T_n\) as \(n \to \infty\).

Next we describe how to construct bootstrap versions of the quantities \(Q_n\) and \(T_n\) using the residual bootstrap.

Definition 38.2 (Residual bootstrap for linear regression) Conditional on the residuals \(\hat \varepsilon_i = Y_i - \mathbf{x}_i^T\hat{\boldsymbol{\beta}}_n\), \(i=1,\dots,n\), introduce independent random variables \(\varepsilon_1^*,\dots,\varepsilon_n^*\) identically distributed according to the empirical distribution of \(\hat \varepsilon_1,\dots,\hat \varepsilon_n\). Then set \(Y_i^* = \mathbf{x}_i^T\hat{\boldsymbol{\beta}}_n + \varepsilon^*_i\) for \(i=1,\dots,n\) and let \[ \hat{\boldsymbol{\beta}}^*_n = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}^*, \] where \(\mathbf{Y}^*=(Y_1^*,\dots,Y_n^*)^T\). Now define bootstrap versions of \(Q_n\) and \(T_n\) as \[\begin{align} Q_n^* &= \sqrt{n}\mathbf{c}^T(\hat{\boldsymbol{\beta}}^*_n - \hat{\boldsymbol{\beta}}_n) \quad \text{ and } \\ T_n^* &= \sqrt{n}\mathbf{c}^T(\hat{\boldsymbol{\beta}}^*_n - \hat{\boldsymbol{\beta}}_n) / \hat \sigma^*_n, \end{align}\] respectively, where \[ (\hat \sigma_n^*)^2 = \frac{1}{n-p}\sum_{i=1}^n(Y_i^* - \mathbf{x}_i^T\hat{\boldsymbol{\beta}}_n^*)^2. \]

The next result claims that the residual bootstrap works for estimating the sampling distributions of \(Q_n\) and \(T_n\).

Theorem 38.2 (Residual bootstrap works) In the linear model in Definition 38.1 under the conditions of Theorem 38.1 we have

\(\sup_{x \in \mathbb{R}}\Big|\mathbb{P}_*(Q_n^* \leq x) - \mathbb{P}(Q_n \leq x)\Big| \stackrel{p}{\rightarrow}0\)
\(\sup_{x \in \mathbb{R}}\Big|\mathbb{P}_*(T_n^* \leq x) - \mathbb{P}(T_n \leq x)\Big| \stackrel{p}{\rightarrow}0\)

as \(n \to \infty\) with \(Q_n^*\) and \(T_n^*\) as defined in Definition 38.2.

I still need to type up a proof of this!

One can show that the residual bootstrap for estimating the distribution of \(T_n\) is second-order correct, but I do not present this as a formal result.

The above result suggests that asymptotic \((1-\alpha)100\%\) confidence intervals for \(\mathbf{c}^T\boldsymbol{\beta}\) can be constructed as follows based on the bootstrap estimates of the sampling distributions of \(Q_n\) and \(T_n\): Given (sorted) Monte Carlo realizations \(Q^{*(1)}\leq \dots \leq Q^{*(B)}_n\) of \(Q_n^*\) and \(T^{*(1)}\leq \dots \leq T^{*(B)}_n\) of \(T_n^*\) for some large \(B\), \((1-\alpha)100\%\) bootstrap confidence intervals for \(\mathbf{c}^T\boldsymbol{\beta}\) based on \(Q_n\) and \(T_n\) can be constructed as \[\begin{align} &\big[\mathbf{c}^T\hat{\boldsymbol{\beta}}_n - Q_n^{*(\lceil (\alpha/2) B\rceil)} n^{-1/2},~ \mathbf{c}^T\hat{\boldsymbol{\beta}}_n - Q_n^{*(\lceil (1-\alpha/2) B\rceil)} n^{-1/2}\big] \quad \text{ and }\\ &\big[\mathbf{c}^T\hat{\boldsymbol{\beta}}_n - T_n^{*(\lceil (\alpha/2) B\rceil)} n^{-1/2}\hat \sigma_n,~ \mathbf{c}^T\hat{\boldsymbol{\beta}}_n - T_n^{*(\lceil (1-\alpha/2) B\rceil)} n^{-1/2}\hat \sigma_n\big] \end{align}\] respectively.