10  Linear minimax risk over general ellipsoids

Consider the Normal means model, this time with infinite dimension, in which we observe

\[ Z_j = \theta_j + \sigma \xi_j, \quad j = 1,2,\dots, \] where \(\boldsymbol{\theta}= (\theta_1,\theta_2,\dots)\) is unknown, \(\xi_1,\xi_2,\dots,\) are independent \(\mathcal{N}(0,1)\) random variables, and \(\sigma > 0\). The goal of this section will be to state the minimax risk for estimating \(\boldsymbol{\theta}\) in the infinite-dimensional Normal means model when \(\boldsymbol{\theta}\) belongs to the Sobolev ellipsoid \(\Theta_{\text{Sob}}(\beta,L)\).

Before coming to the Sobolev ellipsoid, we will find it more convenient to first consider the minimax risk over a general ellipsoid, \(\Theta(c,a_1,a_2,\dots)\), defined as \[ \Theta(c,a_1,a_2,\dots) = \Big\{(\theta_1,\theta_2,\dots) \in \mathbb{R}:~ \sum_{j=1}^\infty \theta^2_j < \infty \text{ and } \sum_{j=1}^\infty a_j^2 \theta_j^2 \leq c^2\Big\}. \] Moreover, rather than considering all possible estimators of \(\boldsymbol{\theta}\), we will consider only linear estimators, that is, estimators of the form \(\hat{\boldsymbol{\theta}}_{\boldsymbol{\lambda}} = (\lambda_1Z_1,\lambda_2Z_2,\dots)\) for \(\lambda = (\lambda_1,\lambda_2,\dots)\). Let \[ M_{\text{lin}}(\Theta(c,a_1,a_2,\dots)) = \inf_{\boldsymbol{\lambda}} \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) \] denote the linear minimax risk over the general ellipse \(\Theta(c,a_1,a_2,\dots)\), where \[ R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) = \sum_{j=1}^\infty \mathbb{E}(\lambda_j Z_j - \theta_j)^2 \] under squared error loss.

The following result is adapted from some results given in Section 3.2 of Tsybakov (2008).

Theorem 10.1 (Linear minimax risk on a general ellipsoid) Let \(a_1,a_2,\dots\) be an increasing sequence such that \(|\{j:a_j = 0\}|<\infty\) and \(a_j \to +\infty\). Then a unique solution to \[ \eta^{-1}\sigma^2\sum_{i=1}^\infty a_j (1 - \eta a_j)_+ = c^2 \tag{10.1}\] over \(\eta > 0\) exists such that, setting \(\ell_j = (1 - \eta a_j)_+\) for \(j=1,2,\dots\) and \(\boldsymbol{\ell}= (\ell_1,\ell_2,\dots)\), we have \[ M_{\text{lin}}(\Theta(c,a_1,a_2,\dots)) = \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\ell},\boldsymbol{\theta}) = \sigma^2\sum_{j=1}^\infty \ell_j, \tag{10.2}\] provided the sum is finite.

The values \(\ell_1,\ell_2,\dots\) defined in Theorem 10.1 are called the Pinsker weights. When we define the Pinsker weights we use \((x)_+ = \mathbf{1}(x > 0)(x)\) for all \(x\in \mathbb{R}\).

Theorem 10.1 does not appear particularly illuminating, but when we choose \(a_1,a_2,\dots\) and \(c\) such that \(\Theta(c,a_1,a_2,\dots)\) becomes the Sobolev ellipsoid \(\Theta_{\text{Sob}}(\beta,L)\), the result will show us something interesting.

Before stating the proof of Theorem 10.1, we give some insight into how one arrives at the Pinsker weights—in particular how the equation Equation 10.1 arises. Note that for any \(\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)\) we have \[ R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) = \sum_{j=1}^\infty(\lambda_j^2\sigma^2 + (1-\lambda_j)^2\theta_j^2). \] This is minimized when \(\lambda_j = \theta_j^2/(\sigma^2 + \theta_j^2)\) for each \(j\) (simple calculus will show this). Plugging these values in for \(\lambda_1,\lambda_2,\dots\) gives \[ \inf_\boldsymbol{\lambda}R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) = \sum_{j=1}^\infty\frac{\sigma^2\theta_j^2}{\sigma^2 + \theta_j^2}. \] Now, we find that an equation like Equation 10.1 arises when we solve the optimization problem \[ \text{ maximize }\sum_{j=1}^\infty\frac{\sigma^2\theta_j^2}{\sigma^2 + \theta_j^2} \quad \text{ subject to } \sum_{j=1}^\infty a_j^2\theta_j^2 = c^2. \] Introducing the Lagrangian function \[ \mathcal{L}(\kappa,\theta_1,\theta_2,\dots) = \sum_{j=1}^\infty\frac{\sigma^2\theta_j^2}{\sigma^2 + \theta_j^2} - \kappa \Big(\sum_{j=1}^\infty a_j^2\theta_j^2 - c^2\Big) \] and setting \[\begin{align} \frac{\partial}{\partial \theta_j}\mathcal{L}(\kappa,\theta_1,\theta_2,\dots) &= 2\frac{\theta_j\sigma^4}{(\sigma^2 + \theta_j^2)^2} - 2\kappa a_j^2\theta_j = 0, \quad j=1,2,\dots\\ \frac{\partial}{\partial \kappa}\mathcal{L}(\kappa,\theta_1,\theta_2,\dots) &= \sum_{j=1}^\infty a_j^2\theta_j^2 - c^2 = 0 \end{align}\] gives \[ \theta_j^2 = \frac{\sigma^2}{\sqrt{\kappa}a_j}(1 - \sqrt{\kappa}a_j)_+, \quad j=1,2,\dots, \tag{10.3}\] and \[ \frac{\sigma^2}{\sqrt{\kappa}}\sum_{j=1}^\infty a_j (1 - \sqrt{\kappa}a_j)_+ = c^2. \tag{10.4}\] Lemma 3.3 of Tsybakov (2008) guarantees that a solution to Equation 10.4 exists when the sequence \(a_j\) is increasing with \(|\{j: a_j = 0\}| <\infty\) and \(a_j \to +\infty\). Now the optimal weights \(\lambda_j = \theta_j^2/(\sigma^2 + \theta_j^2)\) with \(\theta_j^2\) as obtained in Equation 10.3 become \(\lambda_j = (1 - \sqrt{\kappa}a_j)_+\) for \(j=1,2,\dots\) Under this choice of \(\lambda_1, \lambda_2,\dots\) we obtain \(R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) = \sigma^2 \sum_{j=1}^\infty(1 - \sqrt{\kappa}a_j)_+\). This gives us a clue that we should propose \(\sigma^2\sum_{j=1}^\infty \ell_j\) as a candidate for the linear minimax risk in Theorem 10.1.

Note that Equation 10.1 in Theorem 10.1 is the same as Equation 10.4 but with \(\sqrt{\kappa}\) replaced by \(\eta\).

Rather than using the two-step strategy from Chapter 8, we can find the linear minimax risk over a general ellipsoid directly. We follow the arguments given in Section 3.2 of Tsybakov (2008).

First, Lemma 3.3 of Tsybakov (2008) guarantees that a solution to Equation 10.1 exists when the sequence \(a_j\) is increasing with \(|\{j: a_j = 0\}| <\infty\) and \(a_j \to +\infty\).

Now begin by noting \[\begin{align} \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} \inf_{\boldsymbol{\lambda}} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) &\leq \inf_{\boldsymbol{\lambda}} \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) \\ & \leq \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\ell},\boldsymbol{\theta}), \end{align}\] where the first inequality always holds from changing the order of the supremum and the infimum (See Lemma 32.1). From here, we see that it is sufficient to show \[ \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\ell},\boldsymbol{\theta}) \leq \sigma^2\sum_{j=1}^\infty \ell_j \tag{10.5}\] as well as \[ \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} \inf_{\boldsymbol{\lambda}} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) \geq \sigma^2 \sum_{j=1}^\infty \ell_j. \tag{10.6}\] For any \(\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)\) we have \[\begin{align} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\ell},\boldsymbol{\theta}) &= \sum_{j=1}^\infty(\ell_j^2\sigma^2 + (1-\ell_j)^2\theta_j^2) \\ &=\sigma^2 \sum_{j=1}^\infty \ell_j^2 + \sum_{j :~ a_j > 0}(1 - \ell_j)^2a_j^{-2}a_j^2\theta_j^2 \quad \text{($\ell_j = 1$ if $a_j = 0$)} \\ & \leq \sigma^2 \sum_{j=1}^\infty \ell_j^2 + \sup_{j : ~a_j > 0}|(1 - \ell_j)^2a_j^{-2}|\sum_{j = 1}^\infty a_j^2\theta_j^2\\ & \leq \sigma^2 \sum_{j=1}^\infty \ell_j^2 + \eta^2 c^2 \quad \text{($1-\eta a_j \leq \ell_j \implies (1-\ell_j)a_j^{-2} \leq \eta^2$)}\\ & \leq \sigma^2 \sum_{j=1}^\infty \ell_j^2 + \eta \sigma^2 \sum_{j=1}^\infty a_j \ell_j \\ & \leq \sigma^2 \sum_{j=1}^\infty \ell_j(\ell_j + \eta a_j) \\ & \leq \sigma^2 \sum_{j :~\ell_j > 0} \ell_j( 1 - \eta a_j + \eta a_j) \\ & \leq \sigma^2 \sum_{j=1}^\infty \ell_j, \end{align}\] which establishes Equation 10.5.

Now to establish Equation 10.6, let \[ V = \Big\{(v_1,v_2,\dots) : v_j^2 = \frac{\sigma^2(1 - \eta a_j)_+}{\eta a_j} \text{ if } a_j > 0, v_j \in \mathbb{R}\text{ if } a_j = 0\Big\} \] and write \[\begin{align} \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)}\inf_{\boldsymbol{\lambda}}R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) &\geq \sup_{\mathbf{v}\in V}\inf_{\boldsymbol{\lambda}}R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\mathbf{v}) \\ &= \sup_{\mathbf{v}\in V}\Big\{ \sum_{j=1}^\infty\frac{\sigma^2 v_j^2}{\sigma^2 + v_j^2}\Big\} \\ &= \sup_{\mathbf{v}\in V}\Big\{ \sum_{j:~ a_j = 0} \frac{\sigma^2 v_j^2}{\sigma^2 + v_j^2} + \sum_{j:~ a_j > 0} \frac{\sigma^4 (1 - \eta a_j)_+}{\sigma^2\eta a_j + \sigma^2(1 - \eta a_j)_+}\Big\} \\ &= |\{j: a_j = 0\}| + \sigma^2 \sum_{j:~ a_j > 0} (1 - \eta a_j)_+ \\ &= \sigma^2\sum_{j=1}^\infty \ell_j. \end{align}\] This completes the proof.