18  Edgeworth expansions

Continue to assume \(X_1,\dots,X_n\) are independent, identically distributed random variables with mean \(\mu\) and variance \(\sigma^2 < \infty\). Then the central limit theorem gives \[ Z_n \equiv \sqrt{n}(\bar X_n - \mu) / \sigma \overset{\text{d}}{\longrightarrow}\mathcal{N}(0,1) \] as \(n \to \infty\). Regarding this convergence in distribution, Edgeworth expansions can provide answers to such questions as:

  1. How fast is the convergence?
  2. What features of the distribution of \(X_1,\dots,X_n\) affect the rate of convergence and how?

Our purpose in studying Edgeworth expansions is to understand why bootstrapping the pivotal quantity \(T_n = \sqrt{n}(\bar X_n - \mu) / S_n\) leads to better performance than bootstrapping its un-studentized counterpart \(Y_n = \sqrt{n}(\bar X_n - \mu)\), as we may suspect it does from the simulation results plotted in Figure 17.1. Moreover, Edgeworth expansions will enable us to show that the bootstrap applied to the pivotal quantity \(T_n\) can provide a better approximation to its sampling distribution than its limiting \(\mathcal{N}(0,1)\) distribution. Thus we will be able to establish the vaunted second-order correctness property of the bootstrap!

We begin by presenting the Edgeworth expansion, out to order 2, taken from Hall (2013):

Theorem 18.1 (Second-order Edgeworth expansion) Let \(X_1,\dots,X_n\) be independent, identically distributed random variables such that \(\mathbb{E}X_1 = \mu\), \(\mathbb{V}X_1 = \sigma^2 \in (0,\infty)\), and \(\mathbb{E}|X_1|^{j+2} < \infty\) for \(j=1,2\), and set \(Z_n = \sqrt{n}(\bar X_n - \mu) / \sigma\). Then, provided \(\limsup_{|t| \to \infty }|\mathbb{E}\exp(\iota t X_1)| < 1\) (Cramer’s condition), we have \[ \mathbb{P}( Z_n \leq x) = \Phi(x) + n^{-1/2}p_1(x)\phi(x) + n^{-1}p_2(x)\phi(x) + o(n^{-1}) \] as \(n \to \infty\), where \(p_1(x)\) and \(p_2(x)\) are given by \[\begin{align} p_1(x)&= \frac{1}{6}\frac{\mu_3}{\sigma^3}(x^2 - 1)\\ p_2(x) &= \frac{1}{24}(\frac{\mu_4}{\sigma^4} - 3)(x^3 - 3x) + \frac{1}{72}\frac{\mu_3^2}{\sigma^6}(x^5 - 10 x ^3 + 15 x). \end{align}\] In the above \(\mu_3 = \mathbb{E}(X_1 - \mu)^3\) and \(\mu_4 = \mathbb{E}(X_1 - \mu)^4\), and \(\Phi\) and \(\phi\) denote the cdf and pdf, respectively, of the \(\mathcal{N}(0,1)\) distribution.

From Theorem 18.1 we see that the probability \(\mathbb{P}(Z_n \leq x)\) indeed approaches \(\Phi(x)\) as \(n \to \infty\), as known from the central limit theorem, but we can see much more than this. Studying the functions \(p_1(x)\) and \(p_2(x)\), we discover that the rate of the convergence in distribution of \(Z_n\) to the \(\mathcal{N}(0,1)\) distribution depends on some moment properties of the distribution of \(X_1,\dots,X_n\). In particular, we note that the first term beyond \(\Phi(x)\) depends on the skewness \(\mu_3\). If the distribution of \(X_1,\dots,X_n\) is symmetric, we will have \(\mu_3 = 0\), so this term vanishes and \(Z_n\) will converge in distribution at a faster rate to the \(\mathcal{N}(0,1)\) distribution. Moreover, we see that the magnitude of the second term beyond \(\Phi(x)\) depends on the kurtosis \(\mu_4/\sigma^4\) (a measure of tail-heaviness) of the the distribution of \(X_1,\dots,X_n\). If this is equal to \(3\), which is the kurtosis of any Normal distribution, then this term vanishes, making the rate of convergence in distribution of \(Z_n\) all the faster. We will therefore observe the fastest convergence in distribution of \(Z_n\) to the \(\mathcal{N}(0,1)\) distribution when the distribution of \(X_1,\dots,X_n\) is symmetric and has the same kurtosis as a Normal distribution.

Before offering a proof of Theorem 18.1, we must introduce the Hermite polynomials:

Definition 18.1 (Hermite polynomials) The Hermite polynomials \(H_1,H_2,\dots\) are defined by the relation \[ (-1)^k \frac{d^k}{dx^k} \phi(x) = H_k(x)\phi(x), \quad k=1,2,\dots \]

Table 18.1 gives the first six Hermite polynomials.

Table 18.1: The first six Hermite polynomials
\(k\) \(H_k(x)\)
1 \(x\)
2 \(x^2 - 1\)
3 \(x^3 - 3x\)
4 \(x^4 - 6x^2 + 3\)
5 \(x^5 - 10x^3 + 15x\)
6 \(x^6 - 15x^4 + 45x^2 - 15\)

Note that some of the Hermite polynomials appear in the second-order Edgeworth expansion.

We will also need the following result and helpful identity:

Lemma 18.1 (Inversion formula) If \(X\) is a random variable with characteristic function \(\psi_X\) such that \(\int_{-\infty}^\infty |\psi_X(t)|dt < \infty\), then \(X\) has pdf given by \[ f_X(x) = \frac{1}{2\pi} \int_{-\infty}^\infty \exp(-\iota t x) \psi_X(t)dt \] for all \(x \in \mathbb{R}\).

Lemma 18.2 (Helpful identity) We have \[ \frac{1}{2\pi} \int_{-\infty}^\infty (\iota t)^k \exp(-\iota t x) e^{-t^2/2}dt = H_k(x)\phi(x). \] for all \(x \in \mathbb{R}\).

Write \[\begin{align} \frac{1}{2\pi}\int_{-\infty}^\infty \exp(-\iota t x) e^{-t^2/2}(\iota t)^k dt &= \frac{1}{2\pi} \int_{-\infty}^\infty (-1)^k\frac{d^k}{dx^k}\exp(-\iota t x) e^{-t^2/2} dt\\ &= (-1)^k \frac{d^k}{dx^k}\frac{1}{2\pi}\int_{-\infty}^\infty \exp(-\iota t x) e^{-t^2/2} dt\\ &= (-1)^k \frac{d^k}{dx^k}\phi(x)\\ &=H_k(x)\phi(x), \end{align}\] where the third equality comes from applying the inversion formula in Lemma 18.1, since \(e^{-t^2/2\) is the characteristic function of the \(\mathcal{N}(0,1)\) distribution, and the last equality comes from the relation defining the Hermite polynomials given in Definition 18.1.

Now we can prove Theorem 18.1.

The sketch presented here follows closely the sketch in the notes of Dr. David Hunter of Penn State, which he has posted here. It is only a sketch because, as the attentive reader will notice, Cramér’s condition is never explicitly invoked; one needs this to make some remainder terms vanish. Refer to Hall (2013) for complete details.

Without loss of generality assume \(X_1,\dots,X_n\) are independent, identically distributed random variables with \(\mathbb{E}X_1 = 0\), \(\mathbb{V}X_1 = 1\), \(\mathbb{E}X_1^3 = \gamma\), and \(\mathbb{E}X_1^4 = \tau < \infty\). We study the distribution of \(Z_n = \sqrt{n}\bar X_n\), beginning by writing its characteristic function \(\psi_{Z_n}\) as \[\begin{align} \psi_{Z_n}(t) &= \mathbb{E}\exp(\iota t Z_n)\\ &= \textstyle \mathbb{E}\exp(\iota t n^{-1/2}\sum_{i=1}^n X_i)\\ &= \textstyle\mathbb{E}\prod_{i=1}^n\exp(\iota t n^{-1/2} X_i)\\ &= \textstyle \prod_{i=1}^n\mathbb{E}\exp(\iota t n^{-1/2} X_i)\quad \text{(by independence)}\\ &= [\mathbb{E}\exp(\iota t n^{-1/2} X_1)]^n. \quad \text{(since $X_1,\dots,X_n$ are iid )} \end{align}\] Now we make a Taylor expansion of \(\mathbb{E}\exp(\iota t n^{-1/2} X_1)\) around \(t = 0\). We have \[\begin{align} \mathbb{E}\exp(\iota t n^{-1/2} X_1) &= \mathbb{E}\Big[ \sum_{k=0}^4\frac{1}{k!}\Big(\frac{\iota X}{\sqrt{n}}\Big)^k t^k +o(n^{-2})\Big] \\ &= 1 - \frac{t^2}{2n} + \frac{\gamma(\iota t)^3}{6n^{3/2}} + \frac{\tau(\iota t)^4}{24n^2} + o(n^{-2}). \end{align}\] Then Theorem 29.1 (the multinomial theorem) gives \[\begin{align} \psi_{Z_n}(t) &=[\mathbb{E}\exp(\iota t n^{-1/2} X_1)]^n \\ &=\Big[1 - \frac{t^2}{2n} + \frac{\gamma(\iota t)^3}{6n^{3/2}} + \frac{\tau(\iota t)^4}{24n^2} + o(n^{-2})\Big]^n \\ &=\Big(1 - \frac{t^2}{2n}\Big)^n + n \Big(1 - \frac{t^2}{2n}\Big)^{n-1} \frac{\gamma(\iota t)^3}{6n^{3/2}} \\ & \quad + ~ n \Big(1 - \frac{t^2}{2n}\Big)^{n-1}\frac{\tau(\iota t)^4}{24n^2} \\ &\quad + ~\frac{n(n-1)}{2}\Big(1 - \frac{t^2}{2n}\Big)^{n-2}\Big( \frac{\gamma(\iota t)^3}{6n^{3/2}}\Big)^2 + o(n^{-1}) \\ &= \Big(1 - \frac{t^2}{2n}\Big)^n \\ & \quad + ~\Big(1 - \frac{t^2}{2n}\Big)^{n-1}\Big[ \frac{\gamma(\iota t)^3}{6n^{1/2}} + \frac{\tau(\iota t)^4}{24n}\Big] \\ & \quad + ~\Big(1 - \frac{t^2}{2n}\Big)^{n-2}\frac{n-1}{n^2}\frac{\gamma^2(\iota t)^6}{72} + o(n^{-1}). \end{align}\] Next we use the fact that for each nonnegative integer \(k\) \[ \Big(1 + \frac{a}{n}\Big)^{n-k} = e^a \Big(1 - \frac{a(a+k)}{2n}\Big) + o(n^{-1}) \] as \(n\to\infty\) to write the expressions \[\begin{align} \Big(1 - \frac{t^2}{2n}\Big)^n &= e^{-t/2}\Big(1 + \frac{t^2/2(-t^2/2)}{2n}\Big) + o(n^{-1}) \\ \Big(1 - \frac{t^2}{2n}\Big)^{n-1} &= e^{-t/2}\Big(1 + \frac{t^2/2(1-t^2/2)}{2n}\Big) + o(n^{-1}) \\ \Big(1 - \frac{t^2}{2n}\Big)^{n-2} &= e^{-t/2}\Big(1 + \frac{t^2/2(2-t^2/2)}{2n}\Big) + o(n^{-1}). \end{align}\] Plugging the above into our expression for \(\psi_{Z_n}(t)\) and collecting terms with a power of \(n\) greater than 1 in the denominator into the term \(o(n^{-1})\) gives \[\begin{align} \psi_{Z_n}(t) & = e^{-t^2/2}\Big[ 1 - \frac{t^4}{8n} + \frac{\gamma(\iota t)^3}{6n^{1/2}} + \frac{\tau(\iota t)^4}{24n} + \frac{\gamma^2(\iota t)^6}{72n}\Big] + o(n^{-1})\\ & = e^{-t^2/2}\Big[ 1 + \frac{\gamma(\iota t)^3}{6n^{1/2}} + \frac{(\tau - 3)(\iota t)^4}{24n} + \frac{\gamma^2(\iota t)^6}{72n}\Big] + o(n^{-1}). \end{align}\] Now let \(\tilde \psi_{Z_n}(t)\) be the approximation to \(\psi_{Z_n}(t)\) given by \[ \tilde \psi_{Z_n}(t) = e^{-t^2/2}\Big[ 1 + \frac{\gamma(\iota t)^3}{6n^{1/2}} + \frac{(\tau - 3)(\iota t)^4}{24n} + \frac{\gamma^2(\iota t)^6}{72n}\Big]. \] From here, we invert \(\tilde \psi_{Z_n}\) by the inversion formula in Lemma 18.1 to obtain the corresponding approximation \(\tilde f\) to the pdf \(f\) of \(Z_n\). We have \[\begin{align} \tilde f_{Z_n}(x) &= \frac{1}{2\pi}\int_{-\infty}^\infty \exp(-\iota t x) \tilde \psi_{Z_n}(t)dt \\ &= \frac{1}{2\pi}\int_{-\infty}^\infty \exp(-\iota t x) e^{-t^2/2}\Big[ 1 + \frac{\gamma(\iota t)^3}{6n^{1/2}} + \frac{(\tau - 3)(\iota t)^4}{24n} + \frac{\gamma^2(\iota t)^6}{72n}\Big]dt \\ &= \frac{1}{2\pi}\int_{-\infty}^\infty \exp(-\iota t x) e^{-t^2/2}dt \\ & \quad + ~ \frac{\gamma}{6n^{1/2}}\frac{1}{2\pi}\int_{-\infty}^\infty \exp(-\iota t x) e^{-t^2/2} (\iota t)^3dt \\ & \quad + ~\frac{(\tau - 3)}{24n}\frac{1}{2\pi}\int_{-\infty}^\infty \exp(-\iota t x) e^{-t^2/2}(\iota t)^4dt\\ &\quad + ~ \frac{\gamma^2}{72n} \frac{1}{2\pi}\int_{-\infty}^\infty \exp(-\iota t x) e^{-t^2/2}(\iota t)^6dt\\ &=\phi(x) + \frac{\gamma}{6n^{1/2}}H_3(x)\phi(x) + \frac{\tau - 3}{24n}H_4(x)\phi(x) + \frac{\gamma^2}{72n}H_6(x)\phi(x), \end{align}\] where the last equality comes from the helpful identity in Lemma 18.2. To obtain the corresponding approximation \(\tilde F_{Z_n}\) to the cumulative distribution function of \(Z_n\) we take the anti-derivative of \(\tilde f_{Z_n}\), making use of the fact \[ \frac{d}{dx}H_k(x)\phi(x) = - H_{k+1}\phi(x) \] for each \(k\). We obtain \[\begin{align} \tilde F_{Z_n}(x) = \Phi(x) - \frac{\gamma}{6n^{1/2}}H_2(x)\phi(x) - \frac{\tau - 3}{24n}H_3(x)\phi(x) - \frac{\gamma^2}{72n}H_5(x)\phi(x). \end{align}\] The result follows from substituting the expressions in Table 18.1 for the Hermite polynomials and noting that \(\gamma\) and \(\tau\) play the roles of \(\mu_3/\sigma^3\) and \(\mu_4/\sigma^4\).

The next result is also taken from Hall (2013):

Theorem 18.2 (Second-order Edgeworth expansion for studentized pivot) Under the same conditions as Theorem 18.1 and with \(T_n = \sqrt{n}(\bar X_n - \mu)/S_n\), we have \[ \mathbb{P}( T_n \leq x) = \Phi(x) +n^{-1/2}q_1(x)\phi(x) + n^{-1}q_2(x)\phi(x) + o(n^{-1})\\ \] as \(n \to \infty\), where \[\begin{align} q_1(x) &= \frac{1}{6}\frac{\mu_3}{\sigma^3}(2x^2 + 1)\\ q_2(x) & = \frac{1}{12}(\frac{\mu_4}{\sigma^4} - 3)(x^3 - 3x) - \frac{1}{18}\frac{\mu_3^2}{\sigma^6}(x^5 + 2 x^3 - 3 x) - \frac{1}{4}(x^3 + 3x). \end{align}\]

The functions \(q_1\) and \(q_2\) are given in equations (2.55) and (2.57) on pages 72 and 73 of Hall (2013).

We omit the proof of Theorem 18.2.

Next we play with Edgeworth expansions by checking, for a couple of settings, how close the first- and second-order expansions are to the true, finite-\(n\) sampling distributions of a pivotal quantity.