23 Linear minimax risk over general ellipsoids

Consider the Normal means model, this time with infinite dimension, in which we observe

\[ Z_j = \theta_j + \sigma \xi_j, \quad j = 1,2,\dots, \] where $\boldsymbol{\theta}= (\theta_1,\theta_2,\dots)$ is unknown, $\xi_1,\xi_2,\dots,$ are independent $\mathcal{N}(0,1)$ random variables, and $\sigma > 0$. The goal of this section will be to state the minimax risk for estimating $\boldsymbol{\theta}$ in the infinite-dimensional Normal means model when $\boldsymbol{\theta}$ belongs to the Sobolev ellipsoid $\Theta_{\text{Sob}}(\beta,L)$.

Before coming to the Sobolev ellipsoid, we will find it more convenient to first consider the minimax risk over a general ellipsoid, $\Theta(c,a_1,a_2,\dots)$, defined as \[ \Theta(c,a_1,a_2,\dots) = \Big\{(\theta_1,\theta_2,\dots) \in \mathbb{R}:~ \sum_{j=1}^\infty \theta^2_j < \infty \text{ and } \sum_{j=1}^\infty a_j^2 \theta_j^2 \leq c^2\Big\}. \] Moreover, rather than considering all possible estimators of $\boldsymbol{\theta}$, we will consider only linear estimators, that is, estimators of the form $\hat{\boldsymbol{\theta}}_{\boldsymbol{\lambda}} = (\lambda_1Z_1,\lambda_2Z_2,\dots)$ for $\lambda = (\lambda_1,\lambda_2,\dots)$. Let \[ M_{\text{lin}}(\Theta(c,a_1,a_2,\dots)) = \inf_{\boldsymbol{\lambda}} \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) \] denote the linear minimax risk over the general ellipse $\Theta(c,a_1,a_2,\dots)$, where \[ R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) = \sum_{j=1}^\infty \mathbb{E}(\lambda_j Z_j - \theta_j)^2 \] under squared error loss.

The following result is adapted from some results given in Section 3.2 of Tsybakov (2008).

Theorem 23.1 (Linear minimax risk on a general ellipsoid) Let $a_1,a_2,\dots$ be an increasing sequence such that $|\{j:a_j = 0\}|<\infty$ and $a_j \to +\infty$. Then a unique solution to \[ \eta^{-1}\sigma^2\sum_{i=1}^\infty a_j (1 - \eta a_j)_+ = c^2 \tag{23.1}\] over $\eta > 0$ exists such that, setting $\ell_j = (1 - \eta a_j)_+$ for $j=1,2,\dots$ and $\boldsymbol{\ell}= (\ell_1,\ell_2,\dots)$, we have \[ M_{\text{lin}}(\Theta(c,a_1,a_2,\dots)) = \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\ell},\boldsymbol{\theta}) = \sigma^2\sum_{j=1}^\infty \ell_j, \tag{23.2}\] provided the sum is finite.

The values $\ell_1,\ell_2,\dots$ defined in Theorem 23.1 are called the Pinsker weights. When we define the Pinsker weights we use $(x)_+ = \mathbf{1}(x > 0)(x)$ for all $x\in \mathbb{R}$.

Theorem 23.1 does not appear particularly illuminating, but when we choose $a_1,a_2,\dots$ and $c$ such that $\Theta(c,a_1,a_2,\dots)$ becomes the Sobolev ellipsoid $\Theta_{\text{Sob}}(\beta,L)$, the result will show us something interesting.

Before stating the proof of Theorem 23.1, we give some insight into how one arrives at the Pinsker weights—in particular how the equation Equation 23.1 arises. Note that for any $\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)$ we have \[ R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) = \sum_{j=1}^\infty(\lambda_j^2\sigma^2 + (1-\lambda_j)^2\theta_j^2). \] This is minimized when $\lambda_j = \theta_j^2/(\sigma^2 + \theta_j^2)$ for each $j$ (simple calculus will show this). Plugging these values in for $\lambda_1,\lambda_2,\dots$ gives \[ \inf_\boldsymbol{\lambda}R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) = \sum_{j=1}^\infty\frac{\sigma^2\theta_j^2}{\sigma^2 + \theta_j^2}. \] Now, we find that an equation like Equation 23.1 arises when we solve the optimization problem \[ \text{ maximize }\sum_{j=1}^\infty\frac{\sigma^2\theta_j^2}{\sigma^2 + \theta_j^2} \quad \text{ subject to } \sum_{j=1}^\infty a_j^2\theta_j^2 = c^2. \] Introducing the Lagrangian function \[ \mathcal{L}(\kappa,\theta_1,\theta_2,\dots) = \sum_{j=1}^\infty\frac{\sigma^2\theta_j^2}{\sigma^2 + \theta_j^2} - \kappa \Big(\sum_{j=1}^\infty a_j^2\theta_j^2 - c^2\Big) \] and setting \[\begin{align} \frac{\partial}{\partial \theta_j}\mathcal{L}(\kappa,\theta_1,\theta_2,\dots) &= 2\frac{\theta_j\sigma^4}{(\sigma^2 + \theta_j^2)^2} - 2\kappa a_j^2\theta_j = 0, \quad j=1,2,\dots\\ \frac{\partial}{\partial \kappa}\mathcal{L}(\kappa,\theta_1,\theta_2,\dots) &= \sum_{j=1}^\infty a_j^2\theta_j^2 - c^2 = 0 \end{align}\] gives \[ \theta_j^2 = \frac{\sigma^2}{\sqrt{\kappa}a_j}(1 - \sqrt{\kappa}a_j)_+, \quad j=1,2,\dots, \tag{23.3}\] and \[ \frac{\sigma^2}{\sqrt{\kappa}}\sum_{j=1}^\infty a_j (1 - \sqrt{\kappa}a_j)_+ = c^2. \tag{23.4}\] Lemma 3.3 of Tsybakov (2008) guarantees that a solution to Equation 23.4 exists when the sequence $a_j$ is increasing with $|\{j: a_j = 0\}| <\infty$ and $a_j \to +\infty$. Now the optimal weights $\lambda_j = \theta_j^2/(\sigma^2 + \theta_j^2)$ with $\theta_j^2$ as obtained in Equation 23.3 become $\lambda_j = (1 - \sqrt{\kappa}a_j)_+$ for $j=1,2,\dots$ Under this choice of $\lambda_1, \lambda_2,\dots$ we obtain $R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) = \sigma^2 \sum_{j=1}^\infty(1 - \sqrt{\kappa}a_j)_+$. This gives us a clue that we should propose $\sigma^2\sum_{j=1}^\infty \ell_j$ as a candidate for the linear minimax risk in Theorem 23.1.

Note that Equation 23.1 in Theorem 23.1 is the same as Equation 23.4 but with $\sqrt{\kappa}$ replaced by $\eta$.

Proof of Theorem 23.1

Rather than using the two-step strategy from Chapter 21, we can find the linear minimax risk over a general ellipsoid directly. We follow the arguments given in Section 3.2 of Tsybakov (2008).

First, Lemma 3.3 of Tsybakov (2008) guarantees that a solution to Equation 23.1 exists when the sequence $a_j$ is increasing with $|\{j: a_j = 0\}| <\infty$ and $a_j \to +\infty$.

Now begin by noting \[\begin{align} \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} \inf_{\boldsymbol{\lambda}} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) &\leq \inf_{\boldsymbol{\lambda}} \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) \\ & \leq \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\ell},\boldsymbol{\theta}), \end{align}\] where the first inequality always holds from changing the order of the supremum and the infimum (See Lemma 45.1). From here, we see that it is sufficient to show \[ \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\ell},\boldsymbol{\theta}) \leq \sigma^2\sum_{j=1}^\infty \ell_j \tag{23.5}\] as well as \[ \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)} \inf_{\boldsymbol{\lambda}} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) \geq \sigma^2 \sum_{j=1}^\infty \ell_j. \tag{23.6}\] For any $\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)$ we have \[\begin{align} R(\hat{\boldsymbol{\theta}}_\boldsymbol{\ell},\boldsymbol{\theta}) &= \sum_{j=1}^\infty(\ell_j^2\sigma^2 + (1-\ell_j)^2\theta_j^2) \\ &=\sigma^2 \sum_{j=1}^\infty \ell_j^2 + \sum_{j :~ a_j > 0}(1 - \ell_j)^2a_j^{-2}a_j^2\theta_j^2 \quad \text{($\ell_j = 1$ if $a_j = 0$)} \\ & \leq \sigma^2 \sum_{j=1}^\infty \ell_j^2 + \sup_{j : ~a_j > 0}|(1 - \ell_j)^2a_j^{-2}|\sum_{j = 1}^\infty a_j^2\theta_j^2\\ & \leq \sigma^2 \sum_{j=1}^\infty \ell_j^2 + \eta^2 c^2 \quad \text{($1-\eta a_j \leq \ell_j \implies (1-\ell_j)a_j^{-2} \leq \eta^2$)}\\ & \leq \sigma^2 \sum_{j=1}^\infty \ell_j^2 + \eta \sigma^2 \sum_{j=1}^\infty a_j \ell_j \\ & \leq \sigma^2 \sum_{j=1}^\infty \ell_j(\ell_j + \eta a_j) \\ & \leq \sigma^2 \sum_{j :~\ell_j > 0} \ell_j( 1 - \eta a_j + \eta a_j) \\ & \leq \sigma^2 \sum_{j=1}^\infty \ell_j, \end{align}\] which establishes Equation 23.5.

Now to establish Equation 23.6, let \[ V = \Big\{(v_1,v_2,\dots) : v_j^2 = \frac{\sigma^2(1 - \eta a_j)_+}{\eta a_j} \text{ if } a_j > 0, v_j \in \mathbb{R}\text{ if } a_j = 0\Big\} \] and write \[\begin{align} \sup_{\boldsymbol{\theta}\in \Theta(c,a_1,a_2,\dots)}\inf_{\boldsymbol{\lambda}}R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\boldsymbol{\theta}) &\geq \sup_{\mathbf{v}\in V}\inf_{\boldsymbol{\lambda}}R(\hat{\boldsymbol{\theta}}_\boldsymbol{\lambda},\mathbf{v}) \\ &= \sup_{\mathbf{v}\in V}\Big\{ \sum_{j=1}^\infty\frac{\sigma^2 v_j^2}{\sigma^2 + v_j^2}\Big\} \\ &= \sup_{\mathbf{v}\in V}\Big\{ \sum_{j:~ a_j = 0} \frac{\sigma^2 v_j^2}{\sigma^2 + v_j^2} + \sum_{j:~ a_j > 0} \frac{\sigma^4 (1 - \eta a_j)_+}{\sigma^2\eta a_j + \sigma^2(1 - \eta a_j)_+}\Big\} \\ &= |\{j: a_j = 0\}| + \sigma^2 \sum_{j:~ a_j > 0} (1 - \eta a_j)_+ \\ &= \sigma^2\sum_{j=1}^\infty \ell_j. \end{align}\] This completes the proof.