45 Miscellaneous results
$$
$$
Here we collect some miscellaneous definitions and results, some of which are adapted from Athreya and Lahiri (2006).
Proposition 45.1 (Taylor expansion of a multivariate function) Suppose \(f:\mathbb{R}^d \to \mathbb{R}\) has partial derivatives up to order \(k+1\) defined on a convex set \(S\). Then if \(\mathbf{x}\in S\) and \(\mathbf{x}+ \mathbf{t}\in S\) we have \[ f(\mathbf{x}+ \mathbf{t}) = \sum_{|\boldsymbol{\upsilon}| \leq k} \frac{\mathbf{D}^\boldsymbol{\upsilon}f(\mathbf{x})}{\boldsymbol{\upsilon}!} \mathbf{t}^\boldsymbol{\upsilon}+ R_{\mathbf{x},k}(\mathbf{t}), \] where the Lagrange form of the remainder is \[ R_{\mathbf{x},k}(\mathbf{t}) = \sum_{|\boldsymbol{\upsilon}| = k + 1} \frac{\mathbf{D}^\boldsymbol{\upsilon}f(\mathbf{x}+ \tau \mathbf{t})}{\boldsymbol{\upsilon}!} \] for some \(\tau \in [0,1]\).
Lemma 45.1 For a function \(f:\mathcal{X}\times\mathcal{Y}\to \mathbb{R}\) we have \[ \sup_{y \in \mathcal{Y}} \inf_{x \in \mathcal{X}}f(x,y) \leq \inf_{x \in \mathcal{X}} \sup_{y \in \mathcal{Y}} f(x,y). \]
Imagine a matrix of real numbers and consider proving the assertion that the maximum of the row minima cannot exceed the minimum of the column maxima. Consider the minimum entry of any row; the column in which this entry lies has a maximum greater than or equal to this value. Moreover, each other column contains a value greater than or equal to this value (by virtue of its being the minimum in its row), so the maxima of these other columns are all greater than or equal to this value. Thus the maximum of the row minima is a lower bound for the minimum of the column maxima.
Analogously, looking at \(\mathcal{Y}\) as “the rows” and \(\mathcal{X}\) as “the columns”, for any \(y^*\in \mathcal{Y}\), we have \[ \inf_{x \in \mathcal{X}} f(x,y^*) \leq \inf_{x \in \mathcal{X}} \sup_{y \in \mathcal{Y}}f(x,y) \] Now, for every \(\epsilon > 0\) there exists a \(y^*\) such that \[ \sup_{y \in \mathcal{Y}} \inf_{x \in \mathcal{X}} f(x,y) - \epsilon < \inf_{x \in \mathcal{X}} f(x,y^*), \] which gives \[ \sup_{y \in \mathcal{Y}} \inf_{x \in \mathcal{X}} f(x,y) < \inf_{x \in \mathcal{X}} \sup_{y \in \mathcal{Y}}f(x,y) + \epsilon \] for all \(\epsilon > 0\), proving the result.
Definition 45.1 (\(\ell_p\)-norm of a random variable) The \(\ell_p\)-norm of a random variable \(X\) is defined as \[ |X|_p \equiv (\mathbb{E}|X|^p)^{1/p} \] for \(p \in(0,\infty)\).
Proposition 45.2 (Minkowski’s inequality) For any random variables \(X,Y \in \mathbb{R}\) and any \(p \in (1,\infty)\) we have \[ |X - Y|_p \leq |X|_p + |Y|_p. \]
Proposition 45.3 (Jensen’s inequality) If \(g: \mathbb{R}\to \mathbb{R}\) is a convex function, then for any random variable \(X\) we have \[ g(\mathbb{E}X) \leq \mathbb{E}g(X) \] provided \(\mathbb{E}|X| < \infty\) and \(\mathbb{E}|g(X)| < \infty\).
Proposition 45.4 (Kolmogorov’s strong law of large numbers) For \(X_1,\dots,X_n\) identically distributed we have \[ \bar X_n \stackrel{a.s.}{\rightarrow}c \] for some \(c \in \mathbb{R}\) if and only if \(\mathbb{E}|X_1| < \infty\), in which case \(c =\mathbb{E}X_1\).
Theorem 45.1 (Multinomial theorem) For positive integers \(n\) and \(m\) we have \[ (a_1 + \dots + a_m)^n = \sum \Big(\frac{n!}{n_1!\cdots n_m!}\Big) a_1^{n_1}\cdots a_m^{n_m}, \] where the sum is taken over all \((n_1,\dots,n_m) \in \{0,\dots,n\}^m\) such that \(n_1 + \dots + n_m = n\)
The following statement of the Lindeberg central limit theorem is adapted from Theorem 11.1.1 of Athreya and Lahiri (2006).
Theorem 45.2 (Lindeberg central limit theorem) For each \(n \geq 1\) let \(U_1,\dots,U_n\) be a collection of independent random variables with zero mean and finite variances and define \(V_1,\dots,V_n\) such that \[ V_i = \Big(\sum_{j=1}^n \mathbb{V}U_j\Big)^{-1/2}U_{i} \] for \(i = 1,\dots,n\). Then \[ \sum_{i=1}^n V_i \stackrel{d}{\rightarrow}\mathcal{N}(0,1) \] as \(n \to \infty\) provided \[ \sum_{i=1}^n \mathbb{E}|V_i|^2 \mathbf{1}( | V_i| > \epsilon) \to 0 \tag{45.1}\] as \(n \to \infty\) for every \(\epsilon > 0\).
The proof follows pages 345–347 of Athreya and Lahiri (2006), but drops the triangular array notation.
For each \(n \geq 1\), set \(\sigma^2_i = \mathbb{V}U_i\) for \(i=1,\dots,n\) and suppose (without loss of generality) that \(\sum_{i=1}^n \sigma^2_i =1\), so that \(V_i = U_i\). Given that the Lindeberg condition Equation 45.1 is satisfied, we may choose a sequence \(\epsilon_n \to 0\) such that \[ \sum_{i=1}^n \mathbb{E}|V_i|^2 \mathbf{1}(|V_i|>\epsilon_n) \to 0. \tag{45.2}\] From here we will show that the characteristic function of \(\sum_{i=1}^n V_i\) converges to that of the standard Normal distribution, which is given by \(\psi_Z(t) = \exp(-t^2/2)\). Letting \(\psi_i\) represent the characteristic function of \(V_i\) for each \(i=1,\dots,n\), we have \[\begin{align} \Big|\mathbb{E}&\exp \left( \iota t \sum_{j=1}^n V_i \right) - \exp\left(-\frac{t^2}{2}\right)\Big| \\ & \leq \left| \prod_{i=1}^n \psi_i(t) - \prod_{i=1}^n \left(1 - \frac{t^2\sigma_i^2}{2}\right)\right| \\ & \quad \quad + \left|\prod_{i=1}^n \left(1 - \frac{t^2\sigma_i^2}{2}\right) - \prod_{i=1}^n\exp\left(-\frac{t^2\sigma_i^2}{2}\right) \right| \\ & \leq \sum_{i=1}^n\left| \psi_i(t) - \left(1 - \frac{t^2\sigma_i^2}{2}\right)\right| \\ & \quad \quad + \sum_{i=1}^n\left|\exp\left(-\frac{t^2\sigma_i^2}{2}\right) - \left(1 - \frac{t^2\sigma_i^2}{2} \right) \right|\\ & = A_n + B_n, \end{align}\] say, for any \(t \in \mathbb{R}\), where the second inequality comes from Lemma 11.1.3 of Athreya and Lahiri (2006). We show that \(A_n\) and \(B_n\) go to zero as \(n \to \infty\). Since \(|\exp(\iota x) - (1 + \iota x + (\iota x)^2/2)| \leq \min\{|x|^3/3!,|x|^2\}\) for all \(x\in\mathbb{R}\), for all \(t\in \mathbb{R}\), assuming \(\epsilon_n < 1\), we have \[\begin{align} A_n & := \sum_{i=1}^n\left| \psi_i(t) - \left(1 - \frac{t^2\sigma_i^2}{2}\right)\right| \\ & = \sum_{i=1}^n\left| \mathbb{E}\exp(\iota t V_i) - \left(1 + \mathbb{E}\iota t V_i + \frac{(\iota t)^2}{2!}\mathbb{E}V_i^2 \right)\right| \\ & \leq \sum_{i=1}^n \mathbb{E}\min \left\{ \frac{|t V_i|^3}{3!} , |t V_i|^2 \right\} \\ & \leq \sum_{i=1}^n \mathbb{E}|t V_i|^3\mathbf{1}(|V_i \leq \epsilon_n|) + \sum_{i=1}^n \mathbb{E}|t V_i|^2 \mathbf{1}(|V_i|>\epsilon_n) \\ & \leq t^3 \epsilon_n \sum_{i=1}^n \mathbb{E}V_i^2 + t^2\sum_{i=1}^n\mathbb{E}|V_i|^2\mathbf{1}(|V_i| > \epsilon_n)\\ & \to 0 \text{ as $n \to \infty$,} \end{align}\] since \(\sum_{i=1}^n \mathbb{E}V_i^2=1\) and \(\epsilon_n \to 0\) and by Equation 45.2. Now, since \(|e^x - 1 - x | \leq x^2e^{|x|}\) for all \(x \in \mathbb{R}\), we may write \[\begin{align} B_n &:= \sum_{i=1}^n\left|1 - \frac{t^2\sigma_i^2}{2} - \exp\left(-\frac{t^2\sigma_i^2}{2}\right) \right| \\ &\leq \sum_{i=1}^n \left(\frac{t^2\sigma_i^2}{2}\right) \exp\left( \frac{t^2\sigma_i^2}{2}\right) \\ & \leq \frac{t^4}{4}\left(\max_{1\leq i \leq n}\sigma_i^2\right)\exp\left[\frac{t^2}{2}\left(\max_{1\leq i \leq n}\sigma_i^2\right)\right]\sum_{i=1}^n \sigma_i^2 \\ & \leq t^4\left(\max_{1\leq i \leq n}\sigma_i^2\right)\exp\left[t^2\left(\max_{1\leq i \leq n}\sigma_i^2\right)\right]. \end{align}\] Lastly, we have \[\begin{align*} \max_{1 \leq i \leq n}\sigma_i^2 & = \max_{1 \leq i \leq n} \mathbb{E}V_i^2 \\ & = \max_{1\leq i \leq n} \mathbb{E}\left[ |V_i|^2\mathbf{1}(|V_i|\leq\epsilon_n) + |V_i|^2\mathbf{1}(|V_i| > \epsilon_n)\right] \\ & \leq \epsilon_n^2 + \sum_{i=1}^n \mathbb{E}|V_i|^2\mathbf{1}(|V_i| > \epsilon_n)\\ & \to 0 \text{ as $n \to \infty$,} \end{align*}\] by Equation 45.2. This completes the proof.
Corollary 45.1 (Corollary to the Lindeberg central limit theorem) For each \(n \geq 1\), let \(\xi_1,\dots,\xi_n\) be independent random variables with zero mean and unit variance and let \(a_1,\dots,a_n \in \mathbb{R}\) be a collection of real numbers. Then \[ \Big(\sum_{i=1}^n a_i ^2\Big)^{-1/2}\sum_{i = 1}^n a_i \xi_i \stackrel{d}{\rightarrow}\mathcal{N}(0,1) \] as \(n \to \infty\) provided \[ \Big(\sum_{j=1}^n a_j ^2\Big)^{-1/2}\max_{1 \leq i \leq n} |a_i| \to 0 \tag{45.3}\] as \(n \to \infty\).
For each \(n \geq 1\), let \(U_i = a_i\xi_i\), so that \(\mathbb{V}U_i = a_i^2\) for \(i =1,\dots,n\). Accordingly, set \(V_i = (\sum_{i=1}^n a_i^2)^{-1/2}a_i \xi_i\). Now we show that the collections of variables \(V_1,\dots,V_n\), \(n \geq 1\), satisfy the Lindeberg condition in Equation 45.1. We have \[\begin{align} \sum_{i=1}^n &\mathbb{E}|V_i|^2 \mathbf{1}(|V_i| > \epsilon) \\ &= \sum_{i=1}^n \frac{a_i^2 }{\sum_{j=1}^n a_j^2}\mathbb{E}|\xi_i|^2 \mathbf{1}\Big(|a_i \xi_i| > \epsilon (\textstyle \sum_{j=1}^na_j^2)^{1/2}\Big)\\ &\leq \sum_{i=1}^n \frac{a_i^2 }{\sum_{j=1}^na_j^2}\mathbb{E}|\xi_1|^2 \mathbf{1}\Big(|\xi_1|\max_{1\leq i \leq n}|a_i| > \epsilon \textstyle (\sum_{j=1}^n a_j^2)^{1/2}\big)\\ &= \mathbb{E}|\xi_1|^2 \mathbf{1}\Big( |\xi_1| > \epsilon \frac{(\sum_{j=1}^na_j^2)^{1/2}}{\max_{1\leq i \leq n}|a_i|}\Big)\\ &\to 0 \end{align}\] by the dominated convergence theorem, since \(\mathbb{E}|\xi_1|^2 < \infty\) and by the condition in Equation 45.3.
The following result is taken from page 365 of Lehmann (1975).
Lemma 45.2 (Mean and variance of linear combination of randomly permuted constants) Let \(c_1,\dots,c_N\) and \(a(1),\dots,a(N)\) be constants and let \(T_1,\dots,T_N\) be a random permutation of the integers \(1,\dots,N\). Then \[\begin{align} \mathbb{E}\Big(\sum_{i=1}^N c_i a(T_i)\Big) &= \bar a \sum_{i=1}^N c_i\\ \mathbb{V}\Big(\sum_{i=1}^N c_i a(T_i)\Big) &= (N-1)^{-1}\sum_{i=1}^N(c_i - \bar c)^2\sum_{i=1}^N(a(i) - \bar a)^2, \end{align}\] where \(\bar a = N^{-1}\sum_{i=1}^N a(i)\) and \(\bar c = N^{-1}\sum_{i=1}^N c_i\).