Paper The following article is Open access

Nesterov's accelerated gradient method for nonlinear ill-posed problems with a locally convex residual functional

and

Published 12 July 2018 © 2018 IOP Publishing Ltd
, , Citation Simon Hubmer and Ronny Ramlau 2018 Inverse Problems 34 095003 DOI 10.1088/1361-6420/aacebe

0266-5611/34/9/095003

Abstract

In this paper, we consider Nesterov's accelerated gradient method for solving nonlinear inverse and ill-posed problems. Known to be a fast gradient-based iterative method for solving well-posed convex optimization problems, this method also leads to promising results for ill-posed problems. Here, we provide convergence analysis of this method for ill-posed problems based on the assumption of a locally convex residual functional. Furthermore, we demonstrate the usefulness of the method on a number of numerical examples based on a nonlinear diagonal operator and on an inverse problem in auto-convolution.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

In this paper, consider nonlinear inverse problems of the form

Equation (1.1)

where $ \newcommand{\X}{\mathcal{X}} \newcommand{\Y}{\mathcal{Y}} \newcommand{\m}{{\rm m}} \newcommand{\D}{\mathcal{D}} \newcommand{\s}{{\rm s}} F: \D(F) \subset \X \to \Y$ is a continuously Fréchet-differentiable, nonlinear operator between real Hilbert spaces $ \newcommand{\m}{{\rm m}} \newcommand{\X}{\mathcal{X}} \X$ and $ \newcommand{\m}{{\rm m}} \newcommand{\Y}{\mathcal{Y}} \Y$ . Throughout this paper we assume that (1.1) has a solution x*, which need not be unique. Furthermore, we assume that instead of y, we are only given noisy data $ \newcommand{\yd}{y^{\delta}} \yd$ satisfying

Equation (1.2)

Since we are interested in ill-posed problems, we need to use regularization methods in order to obtain stable approximations of solutions of (1.1). The two most prominent examples of such methods are Tikhonov regularization and Landweber iteration.

In Tikhonov regularization, one attempts to approximate an x0-minimum-norm solution $ \newcommand{\xD}{x^\dagger} \xD$ of (1.1), i.e. a solution of $F(x)=y$ with minimal distance to a given initial guess x0, by minimizing the functional

Equation (1.3)

where α is a suitably chosen regularization parameter. Under very mild assumptions on F, it can be shown that the minimizers of $ \newcommand{\T}{\mathcal{T}} \newcommand{\m}{{\rm m}} \newcommand{\Tad}{\mathcal{T}_\alpha^\delta} \Tad$ , usually denoted by $ \newcommand{\xa}{x_\alpha} \newcommand{\xad}{x_\alpha^\delta} \xad$ , converge subsequentially to a minimum norm solution $ \newcommand{\xD}{x^\dagger} \xD$ as $\delta \to 0$ , given that α and the noise level δ are coupled in an appropriate way [9]. While for linear operators F the minimization of $ \newcommand{\T}{\mathcal{T}} \newcommand{\m}{{\rm m}} \newcommand{\Tad}{\mathcal{T}_\alpha^\delta} \Tad$ is straightforward, in the case of nonlinear operators F the computation of $ \newcommand{\xa}{x_\alpha} \newcommand{\xad}{x_\alpha^\delta} \xad$ requires the global minimization of the then also nonlinear functional $ \newcommand{\T}{\mathcal{T}} \newcommand{\m}{{\rm m}} \newcommand{\Tad}{\mathcal{T}_\alpha^\delta} \Tad$ , which is rather difficult and usually done using various iterative optimization algorithms.

This motivates the direct application of iterative algorithms for solving (1.1), the most popular of which being Landweber iteration, given by

Equation (1.4)

where ω is a scaling parameter and x0 is again a given initial guess. Seen in the context of classical optimization algorithms, Landweber iteration is nothing else than the gradient descent method applied to the functional

Equation (1.5)

and therefore, in order to arrive at a convergent regularization method, one has to use a suitable stopping rule. In [9] it was shown that if one uses the discrepancy principle, i.e. stops the iteration after k* steps, where k* is the smallest integer such that

Equation (1.6)

with a suitable constant $\tau > 1$ , then Landweber iteration gives rise to a convergent regularization method, as long as some additional assumptions, most notably the (strong) tangential cone condition

Equation (1.7)

where $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ denotes the closed ball of radius $2 \rho $ around x0, is satisfied. Since condition (1.7) poses strong restrictions on the nonlinearity of F which are not always satisfied, attempts have been made to use weaker conditions instead [32]. For example, assuming only the weak tangential cone condition

Equation (1.8)

with x* denoting a solution of (1.1) to hold, one can show weak convergence of Landweber iteration [32]. Similarly, if the residual functional $ \newcommand{\Phiz}{\Phi^0} \Phiz(x)$ defined by (1.5) is (locally) convex, weak subsequential convergence of the iterates of Landweber iteration to a stationary point of $ \newcommand{\Phiz}{\Phi^0} \Phiz(x)$ can be proven. Even though they both lead to convergence in the weak topology, besides some results presented in [32], the connections between the local convexity of the residual functional and the (weak) tangential cone condition remain largely unexplored. In his recent paper [24], Kindermann showed that both the local convexity of the residual functional and the weak tangential cone condition imply another condition, which he termed $NC(0, \beta>0)$ , and which is sufficient to guarantee weak subsequential convergence of the iterates.

As is well known, Landweber iteration is quite slow [23]. Hence, acceleration strategies have to be used in order to speed it up and make it applicable in practise. Acceleration methods and their analysis for linear problems can be found for example in [9] and [13]. Unfortunately, since their convergence proofs are mainly based on spectral theory, their analysis cannot be generalized to nonlinear problems immediately. However, there are some acceleration strategies for Landweber iteration for nonlinear ill-posed problems, for example [26, 30].

As an alternative to (accelerated) Landweber-type methods, one could think of using second order iterative methods for solving (1.1), such as the Levenberg–Marquardt method [14, 20]

Equation (1.9)

or the iteratively regularized Gauss–Newton method [6, 22]

Equation (1.10)

The advantage of those methods [23] is that they require much less iterations to meet their respective stopping criteria compared to Landweber iteration or the steepest descent method. However, each update step of those iterations might take considerably longer than one step of Landweber iteration, due to the fact that in both cases a linear system involving the operator

has to be solved. In practical applications, this usually means that a huge linear system of equations has to be solved, which often proves to be costly, if not infeasible. Hence, accelerated Landweber type methods avoiding this drawback are desirable in practise.

In case that the residual functional $ \newcommand{\Phid}{\Phi^\delta} \Phid(x)$ is locally convex, one could think of using methods from convex optimization to minimize $ \newcommand{\Phid}{\Phi^\delta} \Phid(x)$ , instead of using the gradient method like in Landweber iteration. One of those methods, which works remarkably well for nonlinear, convex and well-posed optimization problems of the form

Equation (1.11)

was first introduced by Nesterov in [25] and is given by

Equation (1.12)

where again ω is a given scaling parameter and $\alpha \geqslant 3$ (with $\alpha = 3$ being common practise). This so-called Nesterov acceleration scheme is of particular interest, since not only is it extremely easy to implement, but Nesterov himself was also able to prove that it generates a sequence of iterates xk for which there holds

Equation (1.13)

where x* is any solution of (1.11). This is a big improvement over the classical rate $ \newcommand{\m}{{\rm m}} \newcommand{\LandauO}{\mathcal{O}} \LandauO(k^{-1})$ . The even further improved rate $ \newcommand{\m}{{\rm m}} \newcommand{\s}{{\rm s}} \newcommand{\Landauo}{\mathcal{O}} \Landauo(k^{-2})$ for $\alpha > 3$ was recently proven in [2].

Furthermore, Nesterov's acceleration scheme can also be used to solve compound optimization problems of the form

Equation (1.14)

where both $\Phi(x)$ and $\Psi(x)$ are convex functionals, and is in this case given by

Equation (1.15)

where the proximal operator $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \prox{.}$ is defined by

Equation (1.16)

If in addition to being convex, Ψ is proper and lower-semicontinous and Φ is continuously Fréchet differentiable with a Lipschitz continuous gradient, then it was again shown in [2] that the sequence defined by (1.15) satisfies

Equation (1.17)

or even $ \newcommand{\m}{{\rm m}} \newcommand{\s}{{\rm s}} \newcommand{\Landauo}{\mathcal{O}} \Landauo(k^{-2})$ if $\alpha > 3$ , which is again much faster than ordinary first order methods for minimizing (1.14). This accelerating property was exploited in the highly successful FISTA algorithm [4], designed for the fast solution of linear ill-posed problems with sparsity constraints. Since for linear operators the residual functional $ \newcommand{\Phid}{\Phi^\delta} \Phid$ is globally convex, minimizing the resulting Tikhonov functional (1.3) exactly fits into the category of minimization problems considered in (1.15).

Motivated by the above considerations, one could think of applying Nesterov's acceleration scheme (1.12) to the residual functional $ \newcommand{\Phid}{\Phi^\delta} \Phid$ , which leads to the algorithm

Equation (1.18)

In case that the operator F is linear, Neubauer showed in [28] that, combined with a suitable stopping rule and under a source condition, (1.18) gives rise to a convergent regularization method and that convergence rates can be obtained. Furthermore, the authors of [17] showed that certain generalizations of Nesterov's acceleration scheme, termed two-point gradient (TPG) methods and given by

Equation (1.19)

give rise to convergent regularization methods, as long as the tangential cone condition (1.7) is satisfied and the stepsizes $ \newcommand{\ak}{\alpha_k} \newcommand{\akd}{\alpha_k^\delta} \akd$ and the combination parameters $ \newcommand{\lkd}{\lambda^\delta_k} \newcommand{\lk}{\lambda_k} \lkd$ are coupled in a suitable way. However, the convergence analysis of the methods (1.19) does not cover the choice

Equation (1.20)

i.e. the choice originally proposed by Nesterov and the one which shows by far the best results numerically [17, 18, 21]. The main reason for this is that the techniques employed there works with the monotonicity of the iteration, i.e. the iterate $ \newcommand{\xkpd}{x_{k+1}^\delta} \xkpd$ always has to be a better approximation of the solution $ \newcommand{\xs}{x_*} \xs$ than $ \newcommand{\xkd}{x_k^\delta} \xkd$ , which is not necessarily satisfied for the choice (1.20).

The key ingredient for proving the fast rates (1.13) and (1.17) is the convexity of the residual functional $\Phi(x)$ . Since, except for linear operators, we cannot hope that this holds globally, we assume that $ \newcommand{\Phiz}{\Phi^0} \Phiz(x)$ , i.e. the functional $ \newcommand{\Phid}{\Phi^\delta} \Phid(x)$ defined by (1.5) with exact data $ \newcommand{\yd}{y^{\delta}} y = \yd$ , corresponding to $\delta = 0$ , is convex in a neighbourhood of the initial guess. This neighbourhood has to be sufficiently large encompassing the sought solution $ \newcommand{\xs}{x_*} \xs$ , or equivalently, the initial guess x0 has to be sufficiently close to the solution $ \newcommand{\xs}{x_*} \xs$ . Assuming that $F(x) =y$ has a solution $ \newcommand{\xs}{x_*} \xs$ in $ \newcommand{\m}{{\rm m}} \newcommand{\Br}{\mathcal{B}_\rho} \Br(x_0)$ , where now and in the following, $ \newcommand{\m}{{\rm m}} \newcommand{\Br}{\mathcal{B}_\rho} \Br(x_0)$ denotes the closed ball with radius ρ around x0, the key assumption is that $ \newcommand{\Phiz}{\Phi^0} \Phiz$ is convex in $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ . As mentioned before, Nesterov's acceleration scheme yields a non-monotonous sequence of iterates, which might possible leave the ball $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ . However, by assumption the sought for solution $ \newcommand{\xs}{x_*} \xs$ lies in the ball $ \newcommand{\m}{{\rm m}} \newcommand{\Br}{\mathcal{B}_\rho} \Br(x_0)$ . Hence, defining the functional

Equation (1.21)

we can, instead of using (1.12), which would lead to algorithm (1.18), use (1.15), noting that still the fast rate (1.17) can be expected for $\delta=0$ . This leads to the algorithm

Equation (1.22)

which we consider throughout this paper.

2. Convergence analysis I

In this section we provide a convergence analysis of Nesterov's accelerated gradient method (1.22). Concerning notation, whenever we consider the noise-free case $ \newcommand{\yd}{y^{\delta}} y = \yd$ corresponding to $\delta = 0$ , we replace δ by 0 in all variables depending on δ, e.g. we write $ \newcommand{\Phiz}{\Phi^0} \Phiz$ instead of $ \newcommand{\Phid}{\Phi^\delta} \Phid$ . For carrying out the analysis, we have to make a set of assumptions, already indicated in the introduction.

Assumption 2.1. Let ρ be a positive number such that $ \newcommand{\D}{\mathcal{D}} \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \newcommand{\s}{{\rm s}} \Bmr(x_0) \subset \D(F)$ .

  • 1.  
    The operator $ \newcommand{\X}{\mathcal{X}} \newcommand{\Y}{\mathcal{Y}} \newcommand{\m}{{\rm m}} \newcommand{\D}{\mathcal{D}} \newcommand{\s}{{\rm s}} F : \D(F) \subset \X \to \Y$ is continuously Fréchet differentiable between the real Hilbert spaces $ \newcommand{\m}{{\rm m}} \newcommand{\X}{\mathcal{X}} \X$ and $ \newcommand{\m}{{\rm m}} \newcommand{\Y}{\mathcal{Y}} \Y$ with inner products $ \newcommand{\rr}[1]{\left\langle\,#1\,\right\rangle} \newcommand{\s}{{\rm s}} \rr{., .}$ and norms $ \newcommand{\norm}[1]{\left\|#1\right\|} \norm{.}$ . Furthermore, let F be weakly sequentially closed on $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ .
  • 2.  
    The equation $F(x) = y$ has a solution $ \newcommand{\xs}{x_*} \newcommand{\m}{{\rm m}} \newcommand{\Br}{\mathcal{B}_\rho} \xs \in \Br(x_0)$ .
  • 3.  
    The data $ \newcommand{\yd}{y^{\delta}} \yd$ satisfies $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\yd}{y^{\delta}} \norm{y-\yd} \leqslant \delta$ .
  • 4.  
    The functional $\Phi^0$ defined by (1.5) with $\delta = 0$ is convex and has a Lipschitz continuous gradient $\nabla \Phi^0$ with Lipschitz constant L on $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ , i.e.
    Equation (2.1)
    Equation (2.2)
  • 5.  
    For α in (1.22) there holds $\alpha > 3$ and the scaling parameter ω satisfies $0 < \omega < \frac{1}{L}$ .

Note that since $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ is weakly closed and given the continuity of F, a sufficient condition for the weak sequential closedness assumption to hold is that F is compact.

We now turn to the convergence analysis of Nesterov's accelerated gradient method (1.22). Throughout this analysis, if not explicitly stated otherwise, assumption 2.1 is in force. Note first that from F being continuously Fréchet differentiable, we can derive that there exists an $ \newcommand{\ob}{\bar{\omega}} \ob$ such that

Equation (2.3)

Next, note that since $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x)$ denotes a closed ball around x, the functional Ψ, in addition to being proper and convex, is also lower-semicontinous, an assumption required in the proofs in [2], which we need in various places of this paper. Furthermore, it immediately follows from the definition (1.16) of the proximal operator $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \prox{.}$ that

Equation (2.4)

since Ψ defined by (1.21) is equal to $\infty$ outside $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ . Hence, since obviously $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ is a convex set, $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \prox{.}$ is nothing else than the metric projection onto $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ , and is therefore Lipschitz continuous with Lipschitz constant smaller or equal to 1. Consequently, given an estimate of ρ, the implementation of $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \prox{.}$ is exceedingly simple in this setting, and therefore, one iteration step of (1.22) and (1.4) require roughly the same amount of computational effort.

Finally, note that due to the convexity of $ \newcommand{\Phiz}{\Phi^0} \Phiz$ , the set S defined by

Equation (2.5)

is a convex subset of $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ and hence, there exists a unique x0-minimum-norm solution $ \newcommand{\xD}{x^\dagger} \xD$ , which is defined by

Equation (2.6)

which is nothing else than the orthogonal projection of x0 onto the set $ \newcommand{\m}{{\rm m}} \mathcal{S}$ .

The following convergence analysis is largely based on the ideas of the paper [2] of Attouch and Peypouquet, which we reference from frequently throughout this analysis. Following their arguments, we start by making the following

Definition 2.1. For $ \newcommand{\Phid}{\Phi^\delta} \Phid$ and Ψ defined by (1.5) and (1.21), we define

Equation (2.7)

The energy functional $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} \Ed $ is defined by

Equation (2.8)

where the sequence $ \newcommand{\wkd}{w^\delta_k} \wkd$ is defined by

Equation (2.9)

Furthermore, we introduce the operator $ \newcommand{\X}{\mathcal{X}} \newcommand{\Y}{\mathcal{Y}} \newcommand{\m}{{\rm m}} \newcommand{\D}{\mathcal{D}} \newcommand{\Go}{G_{\omega}} \newcommand{\God}{G_{\omega}^\delta} \newcommand{\s}{{\rm s}} \God : \D(F) \subset \X \to \Y$ , given by

Equation (2.10)

Using definition 2.1, we can now write the update step for $ \newcommand{\xkpd}{x_{k+1}^\delta} \xkpd$ in the form

from which, together with (2.9) and the definition of $ \newcommand{\zkd}{z_{k}^\delta} \zkd$ , it follows that

Equation (2.11)

As a first result, we show that both $ \newcommand{\zkd}{z_{k}^\delta} \zkd$ and $ \newcommand{\xkd}{x_k^\delta} \xkd$ stay within $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ during the iteration.

Lemma 2.1. Under the assumption 2.1, the sequence of iterates $ \newcommand{\xkd}{x_k^\delta} \xkd$ and $ \newcommand{\zkd}{z_{k}^\delta} \zkd$ defined by (1.22) is well-defined. Furthermore, $ \newcommand{\xkd}{x_k^\delta} \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \xkd \in \Btr(x_0)$ and $ \newcommand{\zkd}{z_{k}^\delta} \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \zkd \in \Bmr(x_0)$ for all $ \newcommand{\m}{{\rm m}} \newcommand{\N}{\mathbb{N}} k \in \N$ .

Proof. This follows by induction from $ \newcommand{\xd}{x^\delta} \newcommand{\m}{{\rm m}} \newcommand{\Br}{\mathcal{B}_\rho} \xd_0 = \xd_{-1} = x_0 \in \Br(x_0)$ , the observation

and the fact that by the definition of $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \prox{x}$ , $ \newcommand{\xkd}{x_k^\delta} \xkd$ is always an element of $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ . □

Since the functional $ \newcommand{\m}{{\rm m}} \newcommand{\T}{\mathcal{T}} \Theta^0$ is assumed to be convex in $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ , we can deduce:

Lemma 2.2. Under assumption 2.1, for all $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} x, z \in \Bmr(x_0)$ there holds

Proof. This lemma is also used in [2]. However, the sources for it cited there do not exactly cover our setting with $ \newcommand{\Phid}{\Phi^\delta} \Phid$ being defined on $ \newcommand{\X}{\mathcal{X}} \newcommand{\m}{{\rm m}} \newcommand{\D}{\mathcal{D}} \newcommand{\s}{{\rm s}} \D(F) \subset \X$ only. Hence, we here give an elementary proof of the assertion. Note first that due to the Lipschitz continuity of $ \newcommand{\Phiz}{\Phi^0} \Phiz$ in $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ and the fact that $\omega < 1/L$ we have

Now since $ \newcommand{\Phiz}{\Phi^0} \Phiz$ is convex on $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ , also have [3]

and therefore, combining the above two inequalities, we get

Using this result for $ \newcommand{\Go}{G_{\omega}} u = z-\omega \Go^0(z)$ , $v = z$ , w  =  x, noting that for $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} x, z \in \Bmr(x_0)$ there holds $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} u, v, w \in \Bmr(x_0)$ , we get

Equation (2.12)

Next, note that since $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\Phiz}{\Phi^0} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \newcommand{\Go}{G_{\omega}} z-\omega \Go^0(z) = \prox{z- \omega \nabla \Phiz(z)}$ , a standard result from proximal operator theory [3, proposition 12.26] implies that there holds

Adding this inequality to (2.12) and using the fact that by definition $ \newcommand{\Phiz}{\Phi^0} \newcommand{\Thetaz}{\Theta^0} \newcommand{\m}{{\rm m}} \newcommand{\T}{\mathcal{T}} \Thetaz = \Phiz + \Psi$ immediately yields the assertion. □

We want to derive a similar inequality also for the functionals $ \newcommand{\Thetad}{\Theta^\delta} \newcommand{\m}{{\rm m}} \newcommand{\T}{\mathcal{T}} \Thetad$ . The following lemma is of vital importance for doing that:

Lemma 2.3. Let assumption 2.1 hold, let $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} x, z \in \Bmr(x_0)$ and define

Equation (2.13)

as well as

Equation (2.14)

Then there holds

Proof. Using lemma 2.2 we get

from which the statement of the theorem immediately follows. □

Next, we show that the Ri and hence, also R, can be bounded in terms of $\delta + \delta^2$ .

Proposition 2.4. Let assumption 2.1 hold, let $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} x \in \Btr(x_0)$ and $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} z \in \Bmr(x_0)$ and let the $R_1, \dots, R_5$ be defined by (2.14). Then, with $ \newcommand{\ob}{\bar{\omega}} \ob$ as in (2.3), there holds

Proof. The following somewhat long but elementary proof uses mainly the boundedness and Lipschitz continuity assumptions made above. For the following, let $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} x \in \Btr(x_0)$ and $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} z \in \Bmr(x_0)$ . We treat each of the Ri terms separately, starting with

Since we have

and

there holds

Next, we look at

Similarly to above, for the next term we get

Furthermore, together with the Lipschitz continuity of $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \prox{.}$ , we get

Finally, for the last term, we get

which concludes the proof. □

As an immediate consequence, we get the following:

Corollary 2.5. Let assumption 2.1 hold and let $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} x, z \in \Bmr(x_0)$ . If we define

Equation (2.15)

then there holds

Proof. This immediately follows from lemma 2.2 and proposition 2.4. □

Combining the above, we are now able to arrive at the following important result:

Proposition 2.6. Let assumption 2.1 hold, let the sequence of iterates $ \newcommand{\xkd}{x_k^\delta} \xkd$ and $ \newcommand{\zkd}{z_{k}^\delta} \zkd$ be given by (1.22) and let c1 and c2 be defined by (2.15). If we define

Equation (2.16)

then there holds

Equation (2.17)

Equation (2.18)

Proof. This immediately follows from lemma 2.1 and corollary 2.5. □

Using the above proposition, we are now able to derive the important

Theorem 2.7. Let assumption 2.1 hold and let the sequence of iterates $ \newcommand{\xkd}{x_k^\delta} \xkd$ and $ \newcommand{\zkd}{z_{k}^\delta} \zkd$ be given by (1.22) and let $ \newcommand{\m}{{\rm m}} \newcommand{\D}{\mathcal{D}} \Delta(\delta)$ be defined by (2.16). Then there holds

Equation (2.19)

Proof. This proof is adapted from the corresponding result in [2], the difference being the term $ \newcommand{\m}{{\rm m}} \newcommand{\D}{\mathcal{D}} \newcommand{\Dd}{\Delta(\delta)} \Dd$ . We start by multiplying inequality (2.17) by $\frac{k}{k+\alpha-1}$ and inequality (2.18) by $\frac{\alpha-1}{k+\alpha-1}$ . Adding the results and using the fact that $ \newcommand{\xkpd}{x_{k+1}^\delta} \newcommand{\zkd}{z_{k}^\delta} \newcommand{\Go}{G_{\omega}} \newcommand{\God}{G_{\omega}^\delta} \xkpd = \zkd - \omega \God(\zkd)$ , we get

Since

we obtain

Equation (2.20)

Next, observe that it follows from (2.11) that

After developing

and multiplying the above expression by $ \frac{(\alpha - 1){}^2}{2\omega (k+\alpha -1){}^2}$ , we get

Replacing this in inequality (2.20) above, we get

Equivalently, we can write this as

Multiplying by $\frac{2 \omega}{\alpha-1}(k+\alpha-1){}^2$ , we obtain

and therefore, since there holds

we get that

Together with the definition (2.8) of $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} \Ed$ , this implies

or equivalently, after rearranging, we get

which concludes the proof. □

Inequality (2.19) is the key ingredient for showing that (1.22), combined with a suitable stopping rule, gives rise to a convergent regularization method. In order to derive a suitable stopping rule, note first that in the case of exact data, i.e. $\delta = 0$ , inequality (2.19) reduces to

Equation (2.21)

Since by assumption 2.1 the functional $ \newcommand{\Phiz}{\Phi^0} \Phiz$ is convex, the arguments used in [2] are applicable, and we can deduce the following:

Theorem 2.8. Let assumption 2.1 hold, let the sequence of iterates $ \newcommand{\xkz}{x_k^0} \xkz$ and $ \newcommand{\zkz}{z_{k}^0} \zkz$ be given by (1.22) with exact data $ \newcommand{\yd}{y^{\delta}} y = \yd$ , i.e. $\delta=0$ and let $ \newcommand{\m}{{\rm m}} \mathcal{S}$ be defined by (2.5). Then the following statements hold:

  • The sequence $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ez}{\mathcal{E}^0} (\Ez(k))$ is non-increasing and $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ez}{\mathcal{E}^0} \lim\limits_{k\to \infty} \Ez(k)$ exists.
  • For each $k \geqslant 0$ , there holds
  • There holds
    as well as
  • There holds
    as well as
  • There exists an $ \newcommand{\xt}{\tilde{x}} \xt$ in $ \newcommand{\m}{{\rm m}} \mathcal{S}$ , such that the sequence $ \newcommand{\xkz}{x_k^0} (\xkz)$ converges weakly to $ \newcommand{\xt}{\tilde{x}} \xt$ , i.e.
    Equation (2.22)

Proof. The statements follow from facts 1–4, remark 2 and theorem 3 in [2]. □

Thanks to theorem 2.8, we now know that Nesterov's accelerated gradient method (1.22) converges weakly to a solution $ \newcommand{\xt}{\tilde{x}} \xt$ from the solution set $ \newcommand{\m}{{\rm m}} \mathcal{S}$ in case of exact data $ \newcommand{\yd}{y^{\delta}} y = \yd$ , i.e. $\delta = 0$ .

Hence, it remains to consider the behaviour of (1.22) in the case of inexact data $ \newcommand{\yd}{y^{\delta}} \yd$ . As mentioned above, the key for doing so is inequality (2.19). We want to use it to show that, similarly to the exact data case, the sequence $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} (\Ed(k))$ is non-increasing up to some $ \newcommand{\m}{{\rm m}} \newcommand{\N}{\mathbb{N}} k \in \N$ . To do this, note first that $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} \Ed(k)$ is positive as long as

which is true, as long as

Equation (2.23)

On the other hand, the term

Equation (2.24)

in (2.19) is positive, as long as

which is satisfied, as long as

Equation (2.25)

which obviously implies (2.23). These considerations suggest, given a small $\tau > 1$ , to choose the stopping index $ \newcommand{\yd}{y^{\delta}} \newcommand{\ks}{{k_*}} \ks = \ks(\delta, \yd)$ as the smallest integer such that

Equation (2.26)

Concerning the well-definedness of $ \newcommand{\ks}{{k_*}} \ks$ , we are able to prove the following

Lemma 2.9. Let assumption 2.1 hold, let the sequence of iterates $ \newcommand{\xkd}{x_k^\delta} \xkd$ and $ \newcommand{\zkd}{z_{k}^\delta} \zkd$ be given by (1.22) and let c1 and c2 be defined by (2.15). Then the stopping index $ \newcommand{\ks}{{k_*}} \ks$ defined by (2.26) with $\tau > 1$ is well-defined and there holds

Equation (2.27)

Proof. By the definition (2.16) of $ \newcommand{\m}{{\rm m}} \newcommand{\D}{\mathcal{D}} \newcommand{\Dd}{\Delta(\delta)} \Dd$ and due to

it follows from (2.26) that for all $ \newcommand{\ks}{{k_*}} k < \ks$ there holds

which can be rewritten as

Equation (2.28)

where we have used that $\tau > 1$ . Since the left hand side in the above inequality goes to $\infty$ for $k\to \infty$ , while the right hand side stays bounded, it follows that $ \newcommand{\ks}{{k_*}} \ks$ is finite and hence well-defined for $\delta \neq 0$ . Furthermore, since

which can see by multiplying the above inequality by $k(\alpha-3)$ , and since (2.28) also holds for $ \newcommand{\ks}{{k_*}} k=\ks-1$ , we get

Reordering the terms, we arrive at

from which the assertion now immediately follows. □

The rate $ \newcommand{\m}{{\rm m}} \newcommand{\LandauO}{\mathcal{O}} \newcommand{\ks}{{k_*}} \ks = \LandauO(\delta^{-1})$ given in (2.27) for the iteration method (1.22) should be compared with the corresponding result [23, corollary 2.3] for Landweber iteration (1.4), where one only obtains $ \newcommand{\m}{{\rm m}} \newcommand{\LandauO}{\mathcal{O}} \newcommand{\ks}{{k_*}} \ks = \LandauO(\delta^{-2})$ . In order to obtain the rate $ \newcommand{\m}{{\rm m}} \newcommand{\LandauO}{\mathcal{O}} \newcommand{\ks}{{k_*}} \ks = \LandauO(\delta^{-1})$ for Landweber iteration, apart from others, a source condition of the form

Equation (2.29)

has to hold, which is not required for Nesterov's accelerated gradient method (1.22).

Before we turn to our main result, we first prove a couple of important consequences of (2.19) and the stopping rule (2.26).

Proposition 2.10. Let assumption 2.1 be satisfied, let $ \newcommand{\xkd}{x_k^\delta} \xkd$ and $ \newcommand{\zkd}{z_{k}^\delta} \zkd$ be defined by (1.22) and let $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} \Ed$ be defined by (2.8). Assuming that the stopping index $ \newcommand{\ks}{{k_*}} \ks$ is determined by (2.26) with some $\tau > 1$ , then, for all $ \newcommand{\ks}{{k_*}} 0 \leqslant k \leqslant \ks$ , the sequence $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} (\Ed(k))$ is non-increasing and in particular, $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} \Ed(k) \leqslant \Ed(0)$ . Furthermore, for all $ \newcommand{\ks}{{k_*}} 0\leqslant k \leqslant \ks$ there holds

Equation (2.30)

as well as

Equation (2.31)

and

Equation (2.32)

Proof. Due to the definition of the stopping rule (2.26) and the arguments preceding it, the term (2.24) is positive for all $ \newcommand{\ks}{{k_*}} k\leqslant\ks-1$ . Hence, due to (2.19), $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} \Ed(k)$ is non-increasing for all $ \newcommand{\ks}{{k_*}} k \leqslant \ks$ and in particular, $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} \Ed(k) \leqslant \Ed(0)$ . From this observation, (2.30) and (2.31) immediately follow from the definition (2.8) of $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} \Ed(k)$ .

Furthermore, rearranging (2.19) we have

Now, summing over this inequality and using telescoping and the fact that $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} \newcommand{\ks}{{k_*}} \Ed(\ks) \geqslant 0$ we immediately arrive at (2.32), which concludes the proof. □

From the above proposition, we are able to deduce two interesting corollaries.

Corollary 2.11. Under the assumptions of proposition 2.10 there holds

Equation (2.33)

Proof. Using the fact that both $ \newcommand{\xs}{x_*} \newcommand{\xkd}{x_k^\delta} \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \xkd, \xs \in \Btr(x_0)$ , it follows from the definition of $ \newcommand{\Thetad}{\Theta^\delta} \newcommand{\m}{{\rm m}} \newcommand{\T}{\mathcal{T}} \Thetad$ that $ \newcommand{\xkd}{x_k^\delta} \newcommand{\Phid}{\Phi^\delta} \newcommand{\Thetad}{\Theta^\delta} \newcommand{\m}{{\rm m}} \newcommand{\T}{\mathcal{T}} \Thetad(\xkd) = \Phid(\xkd)$ and $ \newcommand{\xs}{x_*} \newcommand{\Phid}{\Phi^\delta} \newcommand{\Thetad}{\Theta^\delta} \newcommand{\m}{{\rm m}} \newcommand{\T}{\mathcal{T}} \Thetad(\xs) = \Phid(\xs)$ . Hence, inequality (2.30) yields

from which, using $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\yd}{y^{\delta}} \norm{y-\yd}\leqslant \delta$ , the statement immediately follows. □

Corollary 2.12. Under the assumptions of proposition 2.10 there holds

Proof. Using the fact that both $ \newcommand{\xs}{x_*} \newcommand{\xkd}{x_k^\delta} \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \xkd, \xs \in \Btr(x_0)$ , it follows from the definition of $ \newcommand{\Thetad}{\Theta^\delta} \newcommand{\m}{{\rm m}} \newcommand{\T}{\mathcal{T}} \Thetad$ that $ \newcommand{\xkd}{x_k^\delta} \newcommand{\Phid}{\Phi^\delta} \newcommand{\Thetad}{\Theta^\delta} \newcommand{\m}{{\rm m}} \newcommand{\T}{\mathcal{T}} \Thetad(\xkd) = \Phid(\xkd)$ and $ \newcommand{\xs}{x_*} \newcommand{\Phid}{\Phi^\delta} \newcommand{\Thetad}{\Theta^\delta} \newcommand{\m}{{\rm m}} \newcommand{\T}{\mathcal{T}} \Thetad(\xs) = \Phid(\xs)$ Hence, it follows with $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\yd}{y^{\delta}} \norm{y-\yd}\leqslant \delta$ that

Together with the definition of the stopping rule (2.26), this implies that for all $ \newcommand{\ks}{{k_*}} k \leqslant \ks-1$

Using this in (2.32) yields

from which the statement now immediately follows. □

Again, this shows that $ \newcommand{\m}{{\rm m}} \newcommand{\LandauO}{\mathcal{O}} \newcommand{\ks}{{k_*}} \ks = \LandauO(\delta^{-1})$ , i.e. $ \newcommand{\ks}{{k_*}} \ks \leqslant c \delta^{-1}$ , however this time the constant c does not depend on c1 and c2, an observation which we use when analysing (1.22) under slightly different assumptions than assumption 2.1 below.

We are now able to prove one of our main results:

Theorem 2.13. Let assumption 2.1 hold and let the iterates $ \newcommand{\xkd}{x_k^\delta} \xkd$ and $ \newcommand{\zkd}{z_{k}^\delta} \zkd$ be defined by (1.22). Furthermore, let $ \newcommand{\yd}{y^{\delta}} \newcommand{\ks}{{k_*}} \ks = \ks(\delta, \yd)$ be determined by (2.26) with some $\tau > 1$ and let the solution set $ \newcommand{\m}{{\rm m}} \mathcal{S}$ be given by (2.5). Then there exists an $ \newcommand{\xt}{\tilde{x}} \newcommand{\m}{{\rm m}} \xt \in \mathcal{S}$ and a subsequence $ \newcommand{\xksd}{x_{k_*}^\delta} \newcommand{\xksdt}{\tilde{x}_{k_*}^{\delta}} \xksdt$ of $ \newcommand{\xksd}{x_{k_*}^\delta} \xksd$ which converges weakly to $ \newcommand{\xt}{\tilde{x}} \xt$ as $\delta \to 0$ . If $ \newcommand{\m}{{\rm m}} \mathcal{S}$ is a singleton, then $ \newcommand{\xksd}{x_{k_*}^\delta} \xksd$ converges weakly to the then unique solution $ \newcommand{\xt}{\tilde{x}} \newcommand{\m}{{\rm m}} \xt \in \mathcal{S}$ .

Proof. This proof follows some ideas of [15]. Let $ \newcommand{\yd}{y^{\delta}} \newcommand{\ydn}{y^{\delta_n}} y_n := \ydn$ be a sequence of noisy data satisfying $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\dn}{{\delta_n}} \norm{y-y_n}\leqslant \dn$ . Furthermore, let $ \newcommand{\ks}{{k_*}} \newcommand{\dn}{{\delta_n}} k_n := \ks(\dn, y_n)$ be the stopping index determined by (2.26) applied to the pair $ \newcommand{\dn}{{\delta_n}} (\dn, y_n)$ . There are two cases. First, assume that k is a finite accumulation point of kn. Without loss of generality, we can assume that kn  =  k for all $ \newcommand{\m}{{\rm m}} \newcommand{\N}{\mathbb{N}} n \in \N$ . Thus, from (2.26), it follows that

which, together with the triangle inequality, implies

Since for fixed k the iterates $ \newcommand{\xkd}{x_k^\delta} \xkd$ depend continuously on the data $ \newcommand{\yd}{y^{\delta}} \yd$ , by taking the limit $n \to \infty$ in the above inequality we can derive

For the second case, assume that $k_n \to \infty$ as $n \to \infty$ . Since $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \newcommand{\dn}{{\delta_n}} x_{k_n}^\dn \in \Btr(x_0)$ , it is bounded and hence, has a weakly convergent subsequence $ \newcommand{\xkndnt}{x_{\tilde{k}_n}^{\tilde{\delta}_n}} \xkndnt$ , corresponding to a subsequence $ \newcommand{\dn}{{\delta_n}} \newcommand{\dnt}{{\tilde{\delta}_n}} \dnt$ of $ \newcommand{\dn}{{\delta_n}} \dn$ and $ \newcommand{\ks}{{k_*}} \newcommand{\knt}{{\tilde{k}_n}} \newcommand{\dn}{{\delta_n}} \newcommand{\dnt}{{\tilde{\delta}_n}} \knt := \ks(\dnt, y^\dnt)$ . Denoting the weak limit of $ \newcommand{\xkndnt}{x_{\tilde{k}_n}^{\tilde{\delta}_n}} \xkndnt$ by $ \newcommand{\xt}{\tilde{x}} \xt$ , it remains to show that $ \newcommand{\xt}{\tilde{x}} \newcommand{\m}{{\rm m}} \xt \in \mathcal{S}$ . For this, observe that it follows from (2.33) that

where we have used that $ \newcommand{\knt}{{\tilde{k}_n}} \knt \to \infty$ and $ \newcommand{\dn}{{\delta_n}} \newcommand{\dnt}{{\tilde{\delta}_n}} \dnt \to 0$ as $n \to \infty$ , which follows from the assumption that so do the sequences kn and $ \newcommand{\dn}{{\delta_n}} \dn$ , and the fact that $ \newcommand{\E}{\mathcal{E}} \newcommand{\m}{{\rm m}} \newcommand{\Ed}{\mathcal{E}^\delta} \Ed(0)$ stays bounded for $\delta \to 0$ . Hence, since we know that $ \newcommand{\yd}{y^{\delta}} \yd \to y$ as $\delta \to 0$ , we can deduce that

and therefore, using the weak sequential closedness of F on $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ , we deduce that $ \newcommand{\xt}{\tilde{x}} F(\xt) = y$ , i.e. $ \newcommand{\xt}{\tilde{x}} \newcommand{\m}{{\rm m}} \xt \in \mathcal{S}$ , which was what we wanted to show.

It remains to show that if $ \newcommand{\m}{{\rm m}} \mathcal{S}$ is a singleton then $ \newcommand{\xksd}{x_{k_*}^\delta} \xksd$ converges weakly to $ \newcommand{\xt}{\tilde{x}} \xt$ . Since this was already proven above in the case that kn has a finite accumulation point, it remains to consider the second case, i.e. $k_n \to \infty$ . For this, consider an arbitrary subsequence of $ \newcommand{\xksd}{x_{k_*}^\delta} \xksd$ . Since this sequence is bounded, it has a weakly convergent subsequence which, by the same arguments as above, converges to a solution $ \newcommand{\xt}{\tilde{x}} \newcommand{\m}{{\rm m}} \xt \in \mathcal{S}$ . However, since we have assumed that $ \newcommand{\m}{{\rm m}} \mathcal{S}$ is a singleton, it follows that $ \newcommand{\xksd}{x_{k_*}^\delta} \xksd$ converges weakly to $ \newcommand{\xt}{\tilde{x}} \xt$ , which concludes the proof. □

Remark. In theorem 2.13, we have shown weak subsequential convergence to an element $ \newcommand{\xt}{\tilde{x}} \xt$ in the solution set $ \newcommand{\m}{{\rm m}} \mathcal{S}$ . However, this element might be different from the x0-minimum norm solution $ \newcommand{\xD}{x^\dagger} \xD$ defined by (2.6), unless of course in case that $ \newcommand{\m}{{\rm m}} \mathcal{S}$ is a singleton.

Furthermore, it follows from the proof of theorem 2.13 that any (weakly) convergent subsequence of $ \newcommand{\xksd}{x_{k_*}^\delta} \xksd$ also converges (weakly) to an element in the solution set $ \newcommand{\m}{{\rm m}} \mathcal{S}$ .

3. Convergence analysis II

Some simplifications of the above presented convergence analysis are possible if we assume that instead of only $ \newcommand{\Phiz}{\Phi^0} \Phiz$ , all the functionals $ \newcommand{\Phid}{\Phi^\delta} \Phid$ are convex. Hence, for the remainder of this section, we work with the following:

Assumption 3.1. Let ρ be a positive number such that $ \newcommand{\D}{\mathcal{D}} \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \newcommand{\s}{{\rm s}} \Bmr(x_0) \subset \D(F)$ .

  • 1.  
    The operator $ \newcommand{\X}{\mathcal{X}} \newcommand{\Y}{\mathcal{Y}} \newcommand{\m}{{\rm m}} \newcommand{\D}{\mathcal{D}} \newcommand{\s}{{\rm s}} F : \D(F) \subset \X \to \Y$ is continuously Fréchet differentiable between the real Hilbert spaces $ \newcommand{\m}{{\rm m}} \newcommand{\X}{\mathcal{X}} \X$ and $ \newcommand{\m}{{\rm m}} \newcommand{\Y}{\mathcal{Y}} \Y$ with inner products $ \newcommand{\rr}[1]{\left\langle\,#1\,\right\rangle} \newcommand{\s}{{\rm s}} \rr{., .}$ and norms $ \newcommand{\norm}[1]{\left\|#1\right\|} \norm{.}$ . Furthermore, let F be weakly sequentially closed on $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ .
  • 2.  
    The equation $F(x) = y$ has a solution $ \newcommand{\xs}{x_*} \newcommand{\m}{{\rm m}} \newcommand{\Br}{\mathcal{B}_\rho} \xs \in \Br(x_0)$ .
  • 3.  
    The data $ \newcommand{\yd}{y^{\delta}} \yd$ satisfies $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\yd}{y^{\delta}} \norm{y-\yd} \leqslant \delta$ .
  • 4.  
    The functionals $ \newcommand{\Phid}{\Phi^\delta} \Phid$ are convex and have Lipschitz continuous gradients $ \newcommand{\Phid}{\Phi^\delta} \nabla \Phid$ with uniform Lipschitz constant L on $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ , i.e.
    Equation (3.1)
  • 5.  
    For α in (1.22) there holds $\alpha > 3$ and the scaling parameter ω satisfies $0 < \omega < \frac{1}{L}$ .

Note that assumption 3.1 is only a special case of assumption 2.1. Hence, the above convergence analysis presented above is applicable and we get weak convergence of the iterates of (1.22). However, the stopping rule (2.26) depends on the constants c1 and c2 defined by (2.15), which are not always available in practise. Fortunately, using the assumption 3.1, we can get rid of c1 and c2. The key idea is to observe that the following lemma holds:

Lemma 3.1. Under assumption 3.1, for all $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} x, z \in \Bmr(x_0)$ there holds

Proof. This follows from the convexity of $ \newcommand{\Thetad}{\Theta^\delta} \newcommand{\m}{{\rm m}} \newcommand{\T}{\mathcal{T}} \Thetad$ in the same way as in lemma 2.2. □

From the above lemma, it follows that the results of corollary 2.5 and proposition 2.6 hold with $ \newcommand{\m}{{\rm m}} \newcommand{\D}{\mathcal{D}} \newcommand{\Dd}{\Delta(\delta)} \Dd = 0$ . Therefore, the stopping rule (2.26) simplifies to

Equation (3.2)

for some $\tau > 1$ , which is nothing else than the discrepancy principle (1.6). Note that in contrast to (2.26), only the noise level δ needs to be known in order to determine the stopping index $ \newcommand{\ks}{{k_*}} \ks$ . With the same arguments as above, we are now able to prove our second main result:

Theorem 3.2. Let assumption 3.1 hold and let the iterates $ \newcommand{\xkd}{x_k^\delta} \xkd$ and $ \newcommand{\zkd}{z_{k}^\delta} \zkd$ be defined by (1.22). Furthermore, let $ \newcommand{\yd}{y^{\delta}} \newcommand{\ks}{{k_*}} \ks = \ks(\delta, \yd)$ be determined by (3.2) with some $\tau > 1$ and let the solution set $ \newcommand{\m}{{\rm m}} \mathcal{S}$ be given by (2.5). Then for the stopping index $ \newcommand{\ks}{{k_*}} \ks$ there holds $ \newcommand{\m}{{\rm m}} \newcommand{\LandauO}{\mathcal{O}} \newcommand{\ks}{{k_*}} \ks = \LandauO(\delta^{-1})$ . Furthermore, there exists an $ \newcommand{\xt}{\tilde{x}} \newcommand{\m}{{\rm m}} \xt \in \mathcal{S}$ and a subsequence $ \newcommand{\xksd}{x_{k_*}^\delta} \newcommand{\xksdt}{\tilde{x}_{k_*}^{\delta}} \xksdt$ of $ \newcommand{\xksd}{x_{k_*}^\delta} \xksd$ which converges weakly to $ \newcommand{\xt}{\tilde{x}} \xt$ as $\delta \to 0$ . If $ \newcommand{\m}{{\rm m}} \mathcal{S}$ is a singleton, then $ \newcommand{\xksd}{x_{k_*}^\delta} \xksd$ converges weakly to the then unique solution $ \newcommand{\xt}{\tilde{x}} \newcommand{\m}{{\rm m}} \xt \in \mathcal{S}$ .

Proof. The proof of this theorem is analogous to the proof of theorem 2.13. The only main difference is the well definedness of $ \newcommand{\ks}{{k_*}} \ks$ , which now cannot be derived from lemma 2.9 but follows from (2.32) by corollary 2.12, which also yields $ \newcommand{\m}{{\rm m}} \newcommand{\LandauO}{\mathcal{O}} \newcommand{\ks}{{k_*}} \ks = \LandauO(\delta^{-1})$ . □

Remark. As in the remark after theorem 2.13, a short inspection of the proof of theorem 3.2 shows again that also in this case any (weakly) convergent subsequence of $ \newcommand{\xksd}{x_{k_*}^\delta} \xksd$ also converges (weakly) to an element in the solution set $ \newcommand{\m}{{\rm m}} \mathcal{S}$ .

Remark. Note that since theorem 3.2 only gives an asymptotic result, i.e. for $\delta \to 0$ , the requirement in assumption 3.1 that the functionals $ \newcommand{\Phid}{\Phi^\delta} \Phid$ have to be convex for all $\delta > 0$ can be relaxed to $ \newcommand{\bd}{\bar{\delta}} 0 \leqslant \delta \leqslant \bd$ , as long as we only consider data $ \newcommand{\yd}{y^{\delta}} \yd$ satisfying the noise constraint $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\yd}{y^{\delta}} \newcommand{\bd}{\bar{\delta}} \norm{y-\yd} \leqslant \delta \leqslant \bd$ .

Remark. Note that if the functionals $ \newcommand{\Phid}{\Phi^\delta} \Phid$ are globally convex and uniformly Lipschitz continuous, which is for example the case if F is a bounded linear operator, then one can choose ρ arbitrarily large in the definition of Ψ. Now, as we have seen above, the proximal mapping $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \prox{.}$ is nothing else than the projection onto $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ . This implies that for practical purposes, $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \prox{.}$ may be dropped in (1.22), which means that one effectively uses (1.18) instead of (1.22).

4. Strong convexity and nonlinearity conditions

In this section, we consider the question of strong convergence of the iterates of (1.22) and comment on the connection between the assumption of local convexity and the (weak) tangential cone condition.

Concerning the strong convergence of the iterates of (1.22) and (1.18), note that it could be achieved if the functional $ \newcommand{\Phiz}{\Phi^0} \Phiz$ were locally strongly convex, i.e. if

Equation (4.1)

since then, for the choice of $ \newcommand{\xkz}{x_k^0} x_1 = \xkz$ and $ \newcommand{\xs}{x_*} x_2 = \xs$ , one gets

from which, since we have $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\xkz}{x_k^0} \norm{F(\xkz)-y} \to 0$ as $\delta \to 0$ , it follows that $ \newcommand{\xkd}{x_k^\delta} \xkd$ converges strongly to $ \newcommand{\xs}{x_*} \xs$ as $\delta \to 0$ . Hence, retracing the proof of theorem 2.13, one would get

Unfortunately, already for linear ill-posed operators F  =  A, strong convexity of the form (4.1) cannot be satisfied, since then one would get

which already implies the well-posedness of Ax  =  y in $ \newcommand{\m}{{\rm m}} \newcommand{\Btr}{\mathcal{B}_{2\rho}} \Btr(x_0)$ . However, defining

Equation (4.2)

it was shown in [16, lemma 3.3] that there holds

Hence, if one could show that $ \newcommand{\xkz}{x_k^0} \newcommand{\m}{{\rm m}} \newcommand{\Mt}{\mathcal{M}_\tau} \xkz \in \Mt$ for some $\tau > 0$ and all $ \newcommand{\m}{{\rm m}} \newcommand{\N}{\mathbb{N}} k \in \N$ , then it would follow that

from which strong convergence of $ \newcommand{\xkz}{x_k^0} \xkz$ , and consequently also of $ \newcommand{\xksd}{x_{k_*}^\delta} \xksd$ to $ \newcommand{\xD}{x^\dagger} \xD$ would follow. In essence, this was done in [28] with tools from spectral theory in the classical framework for analysing linear ill-posed problem [9] under the source condition $ \newcommand{\R}{\mathbb{R}} \newcommand{\xD}{x^\dagger} \newcommand{\m}{{\rm m}} \newcommand{\Range}{\mathcal{R}} \xD \in \Range(A^*)$ .

Remark. Note that it is sometimes possible, given weak convergence of a sequence $ \newcommand{\m}{{\rm m}} \newcommand{\X}{\mathcal{X}} x_k \in \X$ to some element $ \newcommand{\m}{{\rm m}} \newcommand{\X}{\mathcal{X}} \newcommand{\xt}{\tilde{x}} \xt \in \X$ , to infer strong convergence of xk to $ \newcommand{\xt}{\tilde{x}} \xt$ in a weaker topology. For example, if $ \newcommand{\HoO}{H^1(0,1)} x_k \in \HoO$ converges weakly to $ \newcommand{\xt}{\tilde{x}} \xt$ in the $ \newcommand{\HoO}{H^1(0,1)} \HoO$ norm, then it follows that xk converges strongly to $ \newcommand{\xt}{\tilde{x}} \xt$ with respect to the $ \newcommand{\LtO}{L^2(0,1)} \newcommand{\Lt}{{L^2}} \LtO$ norm. Many generalizations of this example are possible. Note further that in finite dimensions, weak and strong convergence coincide.

In the remaining part of this section, we want to comment on the connection of the local convexity assumption (2.1) to other nonlinearity conditions like (1.7) and (1.8) commonly used in the analysis of nonlinear-inverse problems.

First of all, note that due to the results of Kindermann [24], we know that both convexity and the (weak) tangential cone condition imply weak convergence of Landweber iteration (1.4). However, it is not entirely clear in which way those conditions are connected.

One connection of the two conditions was given in [32], where it was shown that the nonlinearity condition implies a certain directional convexity condition. Another connection was provided in [24], where it was shown that the tangential cone condition implies a quasi-convexity condition. However, it is not clear whether or not the tangential cone condition implies convexity or not. What we can say is that convexity does not imply the (weak) tangential cone condition, which is shown in the following

Example 4.1. Consider the operator $F: H^1 (0, 1) \to L^2 (0, 1)$ defined by

Equation (4.3)

This nonlinear Hammerstein operator was extensively treated as an example problem for nonlinear inverse problems (see for example [15, 27]). It is well known that for this operator the tangential cone condition is satisfied around $ \newcommand{\xD}{x^\dagger} \xD$ as long as $ \newcommand{\xD}{x^\dagger} \xD \geqslant c > 0$ . However, the (weak) tangential cone condition is not satisfied in case that $ \newcommand{\xD}{x^\dagger} \newcommand{\e}{{\rm e}} \xD \equiv 0$ . Moreover, it can easily be seen (for example from (5.1)) that $ \newcommand{\Phiz}{\Phi^0} \Phiz(x)$ is globally convex, which shows that convexity does not imply the tangential cone condition.

5. Example problems

In this section, we consider two examples to which we apply the theory developed above. Most importantly, we prove the local convexity assumption for both $ \newcommand{\Phiz}{\Phi^0} \Phiz$ and $ \newcommand{\Phid}{\Phi^\delta} \Phid$ , with δ small enough. Furthermore, based on these example problems, we present some numerical results, demonstrating the usefulness of method (1.22), and supporting the findings of [1719, 21, 28], which are also shortly discussed.

For this, note that if F is twice continuously Fréchet differentiable, then convexity of $ \newcommand{\Phid}{\Phi^\delta} \Phid$ is equivalent to positive semi-definiteness of its second Fréchet derivative [31]. More precisely, we have that (3.1) is equivalent to

Equation (5.1)

which is our main tool for the upcoming analysis.

5.1. Example 1—Nonlinear diagonal operator

For our first (academic) example, we look at the following class of nonlinear diagonal operators

where $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\m}{{\rm m}} \newcommand{\N}{\mathbb{N}} \kl{e_{(n)}}_{n\in\N}$ is the canonical orthonormal basis of $ \newcommand{\e}{{\rm e}} \ell^2$ . These operators are reminiscent of the singular value decomposition of compact linear operators. Here we consider the special choice

Equation (5.2)

for some fixed integer M  >  0. For this choice, F takes the form

It is easy to see that F is a well-defined, twice continuously Fréchet differentiable operator with

Furthermore, note that solving $F(x) = y$ is equivalent to

from which it is easy to see that we are dealing with an ill-posed problem.

We now turn to the convexity of $ \newcommand{\Phid}{\Phi^\delta} \Phid(x)$ around a solution $ \newcommand{\xD}{x^\dagger} \xD$ .

Proposition 5.1. Let $ \newcommand{\xD}{x^\dagger} \xD$ be a solution of $F(x) = y$ such that $ \newcommand{\abs}[1]{\left\vert#1\right\vert} \newcommand{\xD}{x^\dagger} \abs{\xD_{(n)}} > 0$ holds for all $n \in \{1\, , \dots\, , M\}$ . Furthermore, let $\rho > 0$ and $ \newcommand{\bd}{\bar{\delta}} \bd \geqslant 0$ be small enough such that

Equation (5.3)

and let $ \newcommand{\xD}{x^\dagger} \newcommand{\m}{{\rm m}} \newcommand{\Br}{\mathcal{B}_\rho} x_0 \in \Br(\xD)$ . Then for all $ \newcommand{\bd}{\bar{\delta}} 0 \leqslant \delta \leqslant \bd$ , the functional $ \newcommand{\Phid}{\Phi^\delta} \Phid(x)$ is convex in $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ .

Proof. Due to (5.1) it is sufficient to show that

Using the definition of F, the fact that en is an orthonormal basis of $ \newcommand{\e}{{\rm e}} \ell^2$ and that $ \newcommand{\xD}{x^\dagger} F(\xD) = y$ , this inequality can be rewritten into

which, after simplification, becomes

Since the right of the above two sums is always positive, in order for the above inequality to be satisfied it suffices to show that

Equation (5.4)

Now, since by the triangle inequality we have

Equation (5.5)

it follows that in order to prove (5.4) it suffices to show

Now, writing $ \newcommand{\xD}{x^\dagger} \newcommand{\eps}{\varepsilon} \newcommand{\e}{{\rm e}} x = \xD + \eps$ , this can be rewritten into

Since $ \newcommand{\eps}{\varepsilon} \newcommand{\e}{{\rm e}} \eps_{(n)}^2 \geqslant 0$ , the above inequality is satisfied given that

However, since $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\abs}[1]{\left\vert#1\right\vert} \newcommand{\xD}{x^\dagger} \newcommand{\eps}{\varepsilon} \newcommand{\lt}{{\ell^2}} \newcommand{\e}{{\rm e}} \abs{\eps_{(n)}} \leqslant \norm{\eps}_{\ell^2} = \norm{x- \xD}_{\lt} \leqslant \norm{x - x_0}_{\lt} + \norm{x_0 - \xD}_{\lt} \leqslant 7 \rho$ , this follows immediately from (5.3), which concludes the proof. □

Remark. Due to $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\abs}[1]{\left\vert#1\right\vert} \newcommand{\xD}{x^\dagger} \newcommand{\e}{{\rm e}} \newcommand{\lt}{{\ell^2}} \abs{\xD_{(n)}} \leqslant \norm{{\xD}}_\lt$ , condition (5.3) is satisfied given that

which can always be satisfied given that $ \newcommand{\abs}[1]{\left\vert#1\right\vert} \newcommand{\xD}{x^\dagger} \abs{\xD_{(n)}} > 0$ for all $n \in \{1\, , \dots\, , M\}$ .

After proving local convexity of the residual functional around the solution, we now proceed to demonstrate the usefulness of (1.22) based on the following numerical

Example 5.1. For this example we choose fn as in (5.2) with M  =  100. For the exact solution $ \newcommand{\xD}{x^\dagger} \xD$ we take the sequence $ \newcommand{\xD}{x^\dagger} \xD_{(n)} = 100/n$ which leads to the exact data

Hence, condition (5.4) reads as follows

Therefore, the functional $ \newcommand{\Phiz}{\Phi^0} \Phiz$ is convex in $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ given that $\rho \leqslant 1/28 \approx 0.036$ , which is for example the case for the choice

Equation (5.6)

Furthermore, for any noise level $ \newcommand{\bd}{\bar{\delta}} \bd$ small enough, one has that for all $ \newcommand{\bd}{\bar{\delta}} \delta \leqslant \bd$ the functional $ \newcommand{\Phid}{\Phi^\delta} \Phid$ is convex in $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ as long as

which for example is satisfied if

For numerically treating the problem, instead of considering full sequences $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\m}{{\rm m}} \newcommand{\N}{\mathbb{N}} x = \kl{x_{(n)}}_{n \in \N}$ , we only consider $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\xv}{\vec {x}} \xv = \kl{x_{(n)}}_{n=1, \dots, N}$ where we choose N  =  200 in this example. This means that we are considering the following discretized version of F:

We now compare the behaviour of method (1.22) with its non-accelerated Landweber counterpart (1.4) when applied to the problem with $ \newcommand{\xD}{x^\dagger} \xD$ and x0 as defined above. For both methods, we choose the same scaling parameter $\omega = 3.2682 \times 10^{-5}$ estimated from the norm of $ \newcommand{\xD}{x^\dagger} F(\xD)$ and we stop the iteration with the discrepancy principle (1.6) with $\tau = 1$ . Furthermore, random noise with a relative noise level of 0.001% was added to the data to arrive at the noisy data $ \newcommand{\yd}{y^{\delta}} \yd$ and, following the argument presented after (3.2) and since the iterates $ \newcommand{\xkd}{x_k^\delta} \xkd$ remain bounded even without it, we drop the proximal operator $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \prox{.}$ in (1.22). The results of the experiments, computed in MATLAB, are displayed in table 1. The speedup both in time and in the number of iterations achieved by Nesterov's acceleration scheme is obvious. Not only does (1.22) satisfy the discrepancy principle much earlier than (1.4), but also the relative error is even a bit smaller for method (1.22).

Table 1. Comparison of Landweber iteration (1.4) and its Nesterov accelerated version (1.22) when applied to the diagonal operator problem considered in example 5.1.

Method k* Time (s) $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\xD}{x^\dagger} \newcommand{\xkd}{x_k^\delta} \norm{\xD - \xkd }/\norm{\xD}$ (%)
Landweber 82 0.057 0.0109
Nesterov 23 0.019 0.0108

5.2. Example 2—Auto-convolution operator

Next we look at an example involving an auto-convolution operator. Due to its importance in laser optics, the auto-convolution problem has been extensively studied in the literature [1, 5, 11], its ill-posedness has been shown in [7, 10, 12] and its special structure was successfully exploited in [29]. For our purposes, we consider the following version of the auto-convolution operator

Equation (5.7)

where we interpret functions in $ \newcommand{\LtO}{L^2(0,1)} \newcommand{\Lt}{{L^2}} \LtO$ as 1-periodic functions on $ \newcommand{\m}{{\rm m}} \newcommand{\R}{\mathbb{R}} \R$ . For the following, denote by $ \newcommand{\m}{{\rm m}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\en}{e_{{(n)}}} \newcommand{\e}{{\rm e}} (\en)_{n\in \Z}$ the canonical real Fourier basis of $ \newcommand{\LtO}{L^2(0,1)} \newcommand{\Lt}{{L^2}} \LtO$ , i.e.

and by $ \newcommand{\rr}[1]{\left\langle\,#1\,\right\rangle} \newcommand{\s}{{\rm s}} \newcommand{\en}{e_{{(n)}}} \newcommand{\e}{{\rm e}} x_{(n)} := \rr{x, \en}$ the Fourier coefficients of x. It follows that

Equation (5.8)

It was shown in [8] that if only finitely many Fourier components x(n) are non-zero, then a variational source condition is satisfied leading to convergence rates for Tikhonov regularization. We now use this assumption of a sparse Fourier representation to prove convexity of $ \newcommand{\Phid}{\Phi^\delta} \Phid$ for the auto-convolution operator in the following

Proposition 5.2. Let $ \newcommand{\xD}{x^\dagger} \xD$ be a solution of $F(x) = y$ such that there exists an index set $ \newcommand{\m}{{\rm m}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\s}{{\rm s}} \newcommand{\LN}{{\Lambda_N}} \LN \subset \Z$ with $ \newcommand{\abs}[1]{\left\vert#1\right\vert} \newcommand{\LN}{{\Lambda_N}} \abs{\LN} = N$ such that for the Fourier coefficients $ \newcommand{\xD}{x^\dagger} \xD_{(n)}$ of $ \newcommand{\xD}{x^\dagger} \xD$ there holds

Furthermore, let $\rho > 0$ and $ \newcommand{\bd}{\bar{\delta}} \bd \geqslant 0$ be small enough such that

Equation (5.9)

and let $ \newcommand{\xD}{x^\dagger} \newcommand{\m}{{\rm m}} \newcommand{\Br}{\mathcal{B}_\rho} x_0 \in \Br(\xD)$ . Then for all $ \newcommand{\bd}{\bar{\delta}} 0 \leqslant \delta \leqslant \bd$ , the functional $ \newcommand{\Phid}{\Phi^\delta} \Phid(x)$ is convex in $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ .

Proof. As in the previous example, we want to show that (5.1) is satisfied, which, due to (5.8) and the fact that the $ \newcommand{\en}{e_{{(n)}}} \newcommand{\e}{{\rm e}} \en$ form an orthonormal basis is equivalent to

which, after simplification, becomes

and hence, it is sufficient to show that

Equation (5.10)

Note that this is essentially the same condition as (5.4) in the previous example, apart from that here we have to show the inequality for all $ \newcommand{\m}{{\rm m}} \newcommand{\Z}{\mathbb{Z}} n \in \Z$ . However, if $ \newcommand{\LN}{{\Lambda_N}} n \notin \LN$ , then $ \newcommand{\xD}{x^\dagger} \xD_{(n)} = y_{(n)} = 0$ and hence, (5.10) is trivially satisfied. Hence, it remains to prove (5.10) only for $ \newcommand{\LN}{{\Lambda_N}} n\in\LN$ . For this, we write $ \newcommand{\xD}{x^\dagger} \newcommand{\eps}{\varepsilon} \newcommand{\e}{{\rm e}} x_{(n)} = \xD_{(n)} + \eps_{(n)}$ , which allows us to rewrite (5.4) into

Equation (5.11)

Now since, using $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\e}{{\rm e}} \newcommand{\lt}{{\ell^2}} \newcommand{\Lt}{{L^2}} \norm{y}_{\Lt} = \norm{(y_{(n)})}_{\lt}$ , we get as in (5.5) that

and hence, it follows that for inequality (5.11) to be satisfied, it suffices to have

However, since $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\abs}[1]{\left\vert#1\right\vert} \newcommand{\xD}{x^\dagger} \newcommand{\eps}{\varepsilon} \newcommand{\Lt}{{L^2}} \newcommand{\e}{{\rm e}} \abs{\eps_{(n)}} \leqslant \norm{\eps}_\Lt = \norm{x-\xD} \leqslant \norm{x- x_0} + \norm{x_0 - \xD}\leqslant 7 \rho$ , this immediately follows from (5.9), which completes the proof. □

Remark. Similarly to the previous example, condition (5.3) is satisfied given that

which can always be satisfied given that $ \newcommand{\abs}[1]{\left\vert#1\right\vert} \newcommand{\xD}{x^\dagger} \abs{\xD_{(n)}} > 0$ for all $n \in \{1\, , \dots\, , M\}$ .

Remark. Note that one could also consider F as an operator from $ \newcommand{\LtO}{L^2(0,1)} \newcommand{\HoO}{H^1(0,1)} \newcommand{\Lt}{{L^2}} \HoO \to \LtO$ , in which case the local convexity of $ \newcommand{\Phid}{\Phi^\delta} \Phid$ is still satisfied. Since, as noted in section 4, weak convergence in $ \newcommand{\HoO}{H^1(0,1)} \HoO$ implies strong convergence in $ \newcommand{\LtO}{L^2(0,1)} \newcommand{\Lt}{{L^2}} \LtO$ , the convergence analysis carried out in the previous section then implies strong subsequential $ \newcommand{\LtO}{L^2(0,1)} \newcommand{\Lt}{{L^2}} \LtO$ convergence of the iterates $ \newcommand{\xkd}{x_k^\delta} \xkd$ of (1.22) to an element $ \newcommand{\xt}{\tilde{x}} \newcommand{\m}{{\rm m}} \xt \in \mathcal{S}$ from the solution set.

Example 5.2. For this example, we consider the auto-convolution problem with exact solution $ \newcommand{\xD}{x^\dagger} \newcommand{\s}{{\rm s}} \xD(s) := 10 + \sqrt{2} \sin(2\pi s)$ . It follows that

and therefore, the convexity condition (5.9) simplifies to the following two inequalities

Hence, for the noise-free case (i.e. $ \newcommand{\bd}{\bar{\delta}} \bd=0$ ) the functional $ \newcommand{\Phiz}{\Phi^0} \Phiz$ is convex in $ \newcommand{\m}{{\rm m}} \newcommand{\Bmr}{\mathcal{B}_{6\rho}} \Bmr(x_0)$ given that $\rho \leqslant 1/28 \approx 0.036$ and that $ \newcommand{\xD}{x^\dagger} \newcommand{\m}{{\rm m}} \newcommand{\Br}{\mathcal{B}_\rho} x_0 \in \Br(\xD)$ , which is for example the case for the choice $ \newcommand{\s}{{\rm s}} x_0 = 10 + \frac{27}{28}\sqrt{2}\sin(2\pi s)$ .

For discretizing the problem, we choose a uniform discretization of the interval $[0, 1]$ into N  =  32 equally spaced subintervals and introduce the standard finite element hat functions $\{\psi_i \}_{i=0}^N$ on this subdivision, which we use to discretize both $ \newcommand{\m}{{\rm m}} \newcommand{\X}{\mathcal{X}} \X$ and $ \newcommand{\m}{{\rm m}} \newcommand{\Y}{\mathcal{Y}} \Y$ . Following the idea used in [26], we discretize F by the finite dimensional operator

Equation (5.12)

For computing the coefficients fi(x), we employ a 4-point Gaussian quadrature rule on each of the subintervals to approximate the integral in (5.12).

Now we again compare method (1.22) with (1.4). This time, the estimated scaling parameter has the value $\omega = 0.005$ and random noise with a relative noise level of 0.01% was added to the data. Again the discrepancy principle (1.6) with $\tau = 1$ was used and the proximal operator $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\prox}[1]{{\rm prox}_{\omega \Psi}\kl{#1}} \prox{.}$ in (1.22) was dropped. The results of the experiments, computed in MATLAB, are displayed in the left part of table 2. Again the results clearly illustrate the advantages of Nesterov's acceleration strategy, which substantially decreases the required number of iterations and computational time, while leading to a relative error of essentially the same size as Landweber iteration.

The initial guess x0 used for the experiment above is quite close to the exact solution $ \newcommand{\xD}{x^\dagger} \xD$ . Although this is necessary for being able to guarantee convergence by our developed theory, it is not very practical. Hence, we want to see what happens if the solution and the initial guess are so far apart that they are no longer within the guaranteed area of convexity. For this, we consider the choice of $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\xD}{x^\dagger} \newcommand{\s}{{\rm s}} \xD(s) = 10 + \sqrt{2}\sin\kl{8 \pi s}$ and $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\s}{{\rm s}} x_0(s) = 10 + \sqrt{2}\sin\kl{2 \pi s}$ . The result can be seen in the right part of table 2. Landweber iteration was stopped after $10\, 000$ iterations without having reached the discrepancy principle since no more progress was visible numerically. Consequently, it is clearly outperformed by (1.22), which manages to converge already after 797 iterations, and with a much better relative error. The resulting reconstructions, depicted in figure 1, once again underline the usefulness of (1.22).

As an interesting remark, note that it seems that for the second example Landweber iteration gets stuck in a local minimum, while (1.22), after staying at this minimum for a while, manages to escape it, which is likely due to the combination step in (1.22).

Figure 1.

Figure 1. Auto-convolution example: initial guess x0 (blue, dotted), exact solution $ \newcommand{\xD}{x^\dagger} \xD$ (red, solid), Landweber (1.4) reconstruction (purple, dashed), Nesterov (1.22) reconstruction (yellow, dash-dotted).

Standard image High-resolution image

Table 2. Comparison of Landweber iteration (1.4) and its Nesterov accelerated version (1.22) when applied to the auto-convolution problem considered in example 5.2 for the choice $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\xD}{x^\dagger} \newcommand{\s}{{\rm s}} \xD(s) = 10 + \sqrt{2}\sin\kl{2 \pi s}$ and $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\s}{{\rm s}} x_0(s) = 10 + \frac{27}{28}\sqrt{2}\sin\kl{2 \pi s}$ (left table) and $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\xD}{x^\dagger} \newcommand{\s}{{\rm s}} \xD(s) = 10 + \sqrt{2}\sin\kl{8 \pi s}$ and $ \newcommand{\kl}[1]{\left(#1\right)} \newcommand{\s}{{\rm s}} x_0(s) = 10 + \sqrt{2}\sin\kl{2 \pi s}$ (right table).

Method k* Time (s) $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\xD}{x^\dagger} \newcommand{\xkd}{x_k^\delta} \norm{\xD - \xkd }/\norm{\xD}$ (%)
Landweber 526 57 0.0244
Nesterov 50 6 0.0271
Method k* Time (s) $ \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\xD}{x^\dagger} \newcommand{\xkd}{x_k^\delta} \norm{\xD - \xkd }/\norm{\xD}$ (%)
Landweber 10 000 1067 9.57
Nesterov 797 87 0.65

5.3. Further examples

Besides the two rather academic examples presented above, we would like to cite a number of other examples where methods like (1.18) and (1.22) were successfully used, even though the key assumption of local convexity is not always known to hold for them.

First of all, in [18] the parameter estimation problem of magnetic resonance advection imaging (MRAI) was solved using a method very similar to (1.22). In MRAI, one aims at estimating the spatially varying pulse wave velocity (PWV) in blood vessels in the brain from magnetic resonance imaging (MRI) data. The PWV is directly connected to the health of the blood vessels and hence, it is used as a prognostic marker for various diseases in medical examinations. The data sets in MRAI are very large, making the direct application of second order methods like (1.9) or (1.10) difficult. However, since methods like (1.22) can deal with those large datasets, they were used in [18] for reconstructions of the PWV.

Secondly, in [17], numerical examples for various TPG methods (1.19), including the iteration (1.18), were presented. Among those is an example based on the imaging technique of single photon emission computed tomography (SPECT). Various numerical tests show that among all tested TPG methods, the method (1.18) clearly outperforms the rest, even though the local convexity assumption is not known to hold in this case. This is also demonstrated on an example based on a nonlinear Hammerstein operator.

Thirdly, method (1.18) was used in [19] to solve a problem in quantitative elastography, namely the reconstruction of the spatially varying Lamé parameters from full internal static displacement field measurements. Method (1.18) was used to obtain all reconstruction results presented in that paper, since ordinary first-order methods like Landweber iteration (1.4) were too slow to satisfy the demands required in practise.

Finally, in the numerical examples presented in [21], method (1.18) was used to accelerate the employed gradient/Kaczmarz methods. Furthermore, a convergence analysis of (1.18) for linear ill-posed problems including numerical examples is given in [28].

Support and acknowledgments

The authors were partly funded by the Austrian Science Fund (FWF): W1214-N15, project DK8 and F6805-N36, project 5. Furthermore, they would like to thank Dr Stefan Kindermann and Professor Andreas Neubauer for providing valuable suggestions and insights during discussions of the subject.

Please wait… references are loading.
10.1088/1361-6420/aacebe