Paper The following article is Open access

An alternating iterative minimisation algorithm for the double-regularised total least square functional

and

Published 19 May 2015 © 2015 IOP Publishing Ltd
, , Citation Ismael Rodrigo Bleyer and Ronny Ramlau 2015 Inverse Problems 31 075004 DOI 10.1088/0266-5611/31/7/075004

0266-5611/31/7/075004

Abstract

The total least squares (TLS) method is a successful approach for linear problems if both the right-hand side and the operator are contaminated by some noise. For ill-posed problems, a regularisation strategy has to be considered to stabilise the computed solution. Recently a double regularised TLS method was proposed within an infinite dimensional setup and it reconstructs both function and operator, reflected on the bilinear forms Our main focuses are on the design and the implementation of an algorithm with particular emphasis on alternating minimisation strategy, for solving not only the double regularised TLS problem, but a vast class of optimisation problems: on the minimisation of a bilinear functional of two variables.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

In [2], the authors described a new two-parameter regularisation scheme for solving an ill-posed operator equation. The task consists of the inversion of a linear operator ${{A}_{0}}:\mathcal{V}\to \mathcal{H}$ defined between Hilbert spaces

Equation (1)

In contrast to standard inverse problems, where the task is to solve (1) from given noisy data, a more realistic setup is considered where both data and operator are not known exactly. For the reconstruction, a cost functional with two penalisation terms based on the TLS (total least squares) technique is used.

This approach presented in [2] focuses on linear operators that can be characterised by a function, as it is, e.g. the case for linear integral operators, where the kernel function determines the behaviour of the operator. Moreover, it is assumed that the noise in the operator is due to an incorrect characterising function. A penalty term is not only used to stabilise the reconstruction of the unknown solution, as it is the case in [1012], but also to stabilise the unknown operator. As a drawback, the regularisation scheme becomes nonlinear even for linear equations. However, the potential advantage is that not only the unknown solution is reconstructed, but also a suitable characterising function and thus the governing operator describing the underlying data. Additionally, convergence rates for the reconstruction of both solution and operator have been derived.

The double regularised total least squares (dbl-RTLS) approach allow us to treat the problem in the framework of Tikhonov regularisation rather than as a constraint minimisation problem. More precisely, the regularised solution is obtained by minimising a nonlinear, nonconvex and possibly non-differentiable functional over two variables, which is computationally not always straightforward. Thus the goal of this paper is the development of an efficient and convergent numerical scheme for the minimisation of the Tikhonov-type functional for the dbl-RTLS approach.

The rest of paper is organised as follows: in section 2 we formulate the underlying problem and give a short summary of the dbl-RTLS method. Section 3 is dedicated to the development of an algorithm based on an alternating minimisation strategy, as well as its convergence properties. In section 4, numerical results for the proposed algorithm are provided and the efficiency of the method is discussed. For the convenience of the reader in appendix we display important concepts and fundamental results used throughout this article.

2. Problem formulation and the dbl-RTLS method

As mentioned above, we aim at the inversion of the linear operator equation (1) from noisy data ${{g}_{\delta }}$ and an incorrect operator ${{A}_{\epsilon }}$. Additionally we assume that the operators ${{A}_{0}},{{A}_{\epsilon }}:\mathcal{V}\to \mathcal{H}$, where $\mathcal{V}$ and $\mathcal{H}$ are Hilbert spaces, can be characterised by functions ${{k}_{0}},{{k}_{\epsilon }}\in \mathcal{U}$, $\mathcal{U}$ also a Hilbert space. To be more specific, we consider operators

i.e. ${{A}_{k}}v:=B(k,v)$, where B is a bilinear operator

fulfilling, for some $C\gt 0$, the inequality

Equation (2)

From (2) follows immediately

Equation (3)

Associated to the bilinear operator B, we also define the linear operator

i.e. ${{C}_{f}}u:=B(u,f)$.

From now on, let us identify A0 with ${{A}_{{{k}_{0}}}}$ and ${{A}_{\epsilon }}$ with ${{A}_{{{k}_{\epsilon }}}}$. From (3) we deduce immediately

Equation (4)

i.e. the operator error norm is controlled by the error norm of the characterising functions. Now we can formulate our problem as follows:

Equation (5a)

Equation (5b)

Equation (5c)

Please note that the problem with explicitly known k0 (or the operator A0) is often ill-posed and needs regularisation for a stable inversion. Therefore we will also propose a regularising scheme for the problem (5a)–(5c).

Due to our assumptions on the structure of the operator A0, the inverse problem of identifying the function ${{f}^{{\rm true}}}$ from noisy measurements ${{g}_{\delta }}$ and an inexact operator ${{A}_{\epsilon }}$ can now be rewritten as the task of solving the inverse problem find f s.t.

Equation (6)

from noisy measurements $({{k}_{\epsilon }},{{g}_{\delta }})$ fulfilling

Equation (7a)

and

Equation (7b)

In most applications, the 'inversion' of B will be ill-posed (e.g. if B is defined via a Fredholm integral operator), and a regularisation strategy is needed for a stable solution of the problem (6).

For the solution of (6) from given data $({{k}_{\epsilon }},{{g}_{\delta }})$ fulfilling (7), we use the dbl-RTLS method proposed in [2], where the approximations to the solutions are computed as

Equation (8a)

where

Equation (8b)

and

Equation (8c)

Here, $\alpha $ and $\beta $ are the regularisation parameters which have to be chosen properly, $\gamma $ is a scaling parameter (arbitrary but fixed), L is a bounded linear and continuously invertible operator and $\mathcal{R}:X\subset \mathcal{U}\to [0,+\infty ]$ is a proper, convex and weakly lower semi-continuous functional. The functional $J_{\alpha ,\beta }^{\delta ,\varepsilon }$ is composed as the sum of two terms: one which measures the discrepancy of data and operator, and one which promotes stability. The functional ${{T}^{\delta ,\varepsilon }}$ is a data-fidelity term based on the TLS technique, whereas the functional ${{R}_{\alpha ,\beta }}$ acts as a penalty term which stabilises the inversion with respect to the pair (k, f). As a consequence, we have two regularisation parameters, which also occurs in double regularisation, see, e.g. [17].

The domain of the functional $J_{\alpha ,\beta }^{\delta ,\varepsilon }:(\mathcal{U}\cap X)\times \mathcal{V}\longrightarrow \mathbb{R}$ can be extended over $\mathcal{U}\times \mathcal{V}$ by setting $\mathcal{R}(k)=+\infty $ whenever $k\in \mathcal{U}\setminus X$. Then $\mathcal{R}$ is proper, convex and weak lower semi-continuous functional in $\mathcal{U}$.

It has been shown that the sequence of the pair of solutions $({{k}^{n}},{{f}^{n}})$ of (8) converges to a minimum-norm solution when $(\delta ,\epsilon )\to (0,0)$, i.e. it is a regularisation method (see [2, theorem 4.5]). However, the task of finding minimisers of (8) has not been addressed properly, which will be done in the following sections.

3. An algorithm for the minimisation of the dbl-RTLS functional

In this section, we will formulate the first-order necessary condition for critical points of the functional $J_{\alpha ,\beta }^{\delta ,\varepsilon }$, which requires in particular the derivative of the bilinear operator B. The core of this section is to design an algorithm to minimise $J_{\alpha ,\beta }^{\delta ,\varepsilon }$, which is not a trivial task, as the functional is most likely nonconvex and nonlinear.

3.1. Optimality condition

It is well known that the study of local behaviour of nonsmooth functions can be achieved by the concept of subdifferentiality which replaces the classical derivative at non-differentiable points.

The first-order necessary condition based on subdifferentiability is stated as the following: if $(\bar{k},\bar{f})$ minimises the functional $J_{\alpha ,\beta }^{\delta ,\varepsilon }$ then

Equation (9)

We denote the set of all subderivatives of the functional $J_{\alpha ,\beta }^{\delta ,\varepsilon }$ at (k, f) by $\partial J_{\alpha ,\beta }^{\delta ,\varepsilon }(k,f)$ and we name it the subdifferential of $J_{\alpha ,\beta }^{\delta ,\varepsilon }$ at (k, f). For a quick revision on subdifferentiability we refer to the apppendix.

The first result gives us the derivative of a bilinear operator B.

Lemma 3.1. Let B be a bilinear operator and assume that (2) holds. Then the Fréchet derivative of B at $(k,f)\in \mathcal{U}\times \mathcal{V}$ is given by

Moreover, the derivative is Lipschitz continuous with constant $\sqrt{2}C$.

Proof. We have to show

Since B is bilinear, we have

and we observe $\parallel B(u,v)\;\parallel =o(\parallel (u,v)\parallel )$: As B fulfills (2), we have

which converges to zero as $(u,v)\to 0$.

We further observe

which implies

Using the inequality ${{(a+b)}^{2}}\leqslant 2({{a}^{2}}+{{b}^{2}})$ we get

and thus

Note that the adjoint operator ${{(B^{\prime} (k,f))}^{*}}$ of the Frechét derivative $B^{\prime} (k,f)$ exists and is a bounded linear operator whenever both $\mathcal{H}$ and $\mathcal{U}\times \mathcal{V}$ are Hilbert spaces.

In order to analyse the optimality condition (9) we shall compute the subdifferential of a functional over two variables. As pointed out in [6, proposition 2.3.15] for a general function h the set-valued mapping $\partial h:\mathcal{U}\;\rightrightarrows \;{{\mathcal{U}}^{*}}$ the set $\partial h({{x}_{1}},{{x}_{2}})$ and the product set ${{\partial }_{1}}h({{x}_{1}},{{x}_{2}})\times {{\partial }_{2}}h({{x}_{1}},{{x}_{2}})$ are not necessarily contained in each other. Here, ${{\partial }_{i}}h$ denotes the partial subgradient with respect to xi for $i=1,2$. However this is not the case for the functional we are interested in as will be shown in the following theorem.

Theorem 3.2. Let $J:\mathcal{U}\times \mathcal{V}\to \bar{\mathbb{R}}$ be a functional with the structure

Equation (10)

where Q is a (nonlinear) differentiable term and $\varphi :\mathcal{U}\to \bar{\mathbb{R}}$, $\psi :\mathcal{V}\to \bar{\mathbb{R}}$ are proper convex functions, $u\in {\rm dom}\;\varphi $ and $v\in {\rm dom}\;\psi $ . Then

Proof. In general the subdifferential of a sum of functions does not equal the sum of its subdifferentials. However, if Q is differentiable, $\varphi $ and ψ are convex some inclusions and even equalities hold true (combining [6, proposition 2.3.3; corollary 3; proposition 2.3.6]), as for instance,

Since Q is differentiable, calling the previous results, the (partial) subderivative is unique [6], proposition 2.3.15] and therefore

Note that the subderivative of the sum of two separable convex functionals satisfies

see [18, corollary 2.4.5].

Altogether, we can compute the subderivative as follows

Equation (11)

The last implication of this theorem,

follows straightforward by the definition of partial subderivative and (11). □

Please note that the above proof holds for all definitions of subdifferentials introduced in the appendix, as for convex functionals all the definitions are equivalent, and for differentiable (possibly nonlinear) terms the subdifferential is a singleton and the subderivative equals the derivative. Based on theorem 3.2 we can now calculate the derivative of the functional which is the gist for building up the upcoming algorithm; please give heed to the structure of (10) and the proposed functional $J_{\alpha ,\beta }^{\delta ,\varepsilon }$:

Corollary 3.3. Let $J_{\alpha ,\beta }^{\delta ,\varepsilon }$ the functional defined in (8), then

where $\zeta \in \partial \mathcal{R}(k)$.

Proof. The result follows straightforward from lemma 3.1 and theorem 3.2. Observe that the sum $C_{f}^{*}({{C}_{f}}k-{{g}_{\delta }})+\gamma (k-{{k}_{\epsilon }})+\beta \zeta $ is well-defined in the Hilbert space $\mathcal{U}$, since the subderivative $\zeta \partial \mathcal{R}(k)$ is also an element of $\mathcal{U}$.□

Up to now, we did not specify the functional $\mathcal{R}$, it is only required to be convex and lower semi-continuous. We are particularly interested in, e.g. the Lp norm or the weighted p norm, denoted by $\mathcal{R}(k)=\parallel k{{\parallel }_{w,p}}$. Its subdifferential is given in section 4. An easy way to compute the subderivatives of functionals $\mathcal{R}$ with a specific structure is given by the following lemma.

Lemma 3.4. [(3, lemma 4.4)] Let $\mathcal{H}={{L}_{2}}(\Omega ,d\mu )$ where Ω is a σ-finite measure space. Let $\mathcal{R}:\mathcal{H}\to (-\infty ,+\infty ]$ be defined by

Equation (12)

where $h:\mathbb{C}\to \mathbb{R}$ is a convex function. Then $\xi \in \mathcal{H}$ is an element of $\partial \mathcal{R}(u)$ if and only if $\xi (x)\in \partial h(u(x))$ for almost every $x\in \Omega $ (with the identification $\mathbb{C}={{\mathbb{R}}^{2}}$).

3.2. An alternating minimisation algorithm

Coordinate descent methods are based on the idea that the minimisation of a multivariable function can be achieved by minimising it along one direction at a time. It is a simple and surprisingly efficient technique. The coordinates can be chosen arbitrarily with any permutation, but one can also replace them by block coordinates (for more details see [16] and references therein). This method is closely related to coordinate gradient descent (CGD), Gauss–Seidel and SOR methods, which was studied previously by several authors and described in various optimisation books, e.g. [1, 14]. In the unconstrained setting the method is called alternating minimisation (AM) when the variables are split into two blocks.

The computation of a solution of dbl-RTLS is not straightforward, as determining the minimum of the functional (8) with respect to both parameters is a nonlinear and nonconvex problem over two variables. Nevertheless we shall overcome this problem by applying some coordinate descent techniques.

In the following we shall denote the dbl-RTLS functional by J instead of $J_{\alpha ,\beta }^{\delta ,\varepsilon }$, as the parameters of the functionals are kept fix for the minimisation process.

In the AM algorithm, the functional is minimised iteratively with two alternating minimisation steps. Each step minimises the problem over one variable while keeping the second variable fixed:

Equation (13a)

Equation (13b)

The notation $J(k,f|u)$ means we minimise the function J with u fixed, where u can be either k or f. Thus we minimise in each cycle the functionals

and

We highlight some important facts:

  • 1.  
    For each subproblem, the considered operators are linear, and the functional is convex. Thus a local minimum is global.
  • 2.  
    The first step is a standard quadratic minimisation problem.

First we will show a monotonicity result for the sequence ${{\{({{k}^{n}},{{f}^{n}})\}}_{n}}$ of iterates:

Proposition 3.5. The functional J is non-increasing on the AM iterates,

Proof. The iterates are defined as

and

Therefore,

and

and in particular, setting $f={{f}^{n}}$ and $k={{k}^{n}}$,

and

The existence of a minimiser of the functional J has already been proven in [2, theorem 4.2]. The goal of the following results is to prove that the sequence generated by the alternating minimisation algorithm has at least a subsequence which converges towards to a critical point of the functional. Throughout this section, let us make the following assumptions.

Assumption A. 

  • B is strongly continuous, i.e. if $({{k}^{n}},{{f}^{n}})\rightharpoonup (\bar{k},\bar{f})$ then $B({{k}^{n}},{{f}^{n}})\to B(\bar{k},\bar{f})$.
  • The adjoint of the Fréchet derivative B' of B is strongly continuous, i.e. if $({{k}^{n}},{{f}^{n}})\rightharpoonup (\bar{k},\bar{f})$ then $B^{\prime} {{({{k}^{n}},{{f}^{n}})}^{*}}z\to B^{\prime} {{(\bar{k},\bar{f})}^{*}}z$, $\forall z\in \mathcal{D}(B^{\prime} )$

Additionally to the standard norm for the pair $(k,f)\in \mathcal{U}\times \mathcal{V}$

we define the weighted norm for given $\gamma \gt 0$ as

Proposition 3.6. For given regularisation parameters $0\lt \alpha $ and $\beta $, the sequence ${{\{({{k}^{n+1}},{{f}^{n+1}})\}}_{n+1}}$ of iterates generated by the AM algorithm has at least a weakly convergent subsequence $({{k}^{{{n}_{j}}+1}},{{f}^{{{n}_{j}}+1}})\rightharpoonup (\bar{k},\bar{f})$, and its limit fulfils

Equation (14)

for all $f\in \mathcal{V}$ and for all $k\in \mathcal{U}$.

Proof. As the iterates of the AM algorithm can be characterised as the minimisers of a reduced dbl-RTLS functional, see (13a), (13b) we observe

and

Keeping in mind that the operator L is continuously invertible, the first inequality gives

Using the second estimate above and the standard inequality $\parallel a+b{{\parallel }^{2}}\leqslant 2(\parallel a{{\parallel }^{2}}+\parallel b{{\parallel }^{2}})$ we have

Thus, the sequence ${{\{({{k}^{n+1}},{{f}^{n+1}})\}}_{n+1}}$ is bounded

and by Alaoglu's theorem, it has a weakly convergent subsequence ${{\{({{k}^{{{n}_{j}}+1}},{{f}^{{{n}_{j}}+1}})\}}_{{{n}_{j}}+1}}\rightharpoonup (\bar{k},\bar{f})$ .

Since ${{f}^{{{n}_{j}}+1}}$ minimises the functional $J({{k}^{{{n}_{j}}}},f)$ for a fixed knj, it holds

and thus

Using the fact that J is w-lsc and the strong continuity of B, we observe

Equation (15)

Therefore,

The second inequality in (14) is proven similarly: since ${{k}^{{{n}_{j}}+1}}$ minimises the functional $J(k,{{f}^{{{n}_{j}}+1}})$ for fixed ${{f}^{{{n}_{j}}+1}}$ it is

which is equivalent to

Again, we observe

Equation (16)

and thus

In summary, the AM algorithm yields a bounded sequence ${{\{({{k}^{n+1}},{{f}^{n+1}})\}}_{n}}$ and hence a weakly convergent subsequence. The next two results extend the convergence on the strong topology, for both ${{\{{{k}^{{{n}_{j}}+1}}\}}_{{{n}_{j}}}}$ and ${{\{{{f}^{{{n}_{j}}+1}}\}}_{{{n}_{j}}}}$, respectively.

Proposition 3.7. Let ${{\{({{k}^{{{n}_{j}}+1}},{{f}^{{{n}_{j}}+1}})\}}_{{{n}_{j}}}}$ be a weakly convergent (sub-) sequence generated by the AM algorithm (13), where ${{k}^{{{n}_{j}}+1}}\rightharpoonup \bar{k}$ and ${{f}^{{{n}_{j}}+1}}\rightharpoonup \bar{f}$. Then there exists a subsequence ${{\{{{k}^{{{n}_{{{j}_{m}}}}+1}}\}}_{{{n}_{{{j}_{m}}}}}}$ of ${{\{{{k}^{{{n}_{j}}+1}}\}}_{{{n}_{j}}}}$ such that ${{k}^{{{n}_{{{j}_{m}}}}+1}}\to \bar{k}$ and $0\in {{\partial }_{k}}J(\bar{k},\bar{f})$.

Proof. Inequalities (16) in the proposition 3.6's proof reads

for any k. Setting $k=\bar{k}$ yields in particular

As the limes inferior exists, we can in particular extract a subsequence ${{({{k}^{{{n}_{{{j}_{m}}}}+1}},{{f}^{{{n}_{{{j}_{m}}}}+1}})}_{{{n}_{{{j}_{m}}}}}}$ of ${{({{k}^{{{n}_{j}}+1}},{{f}^{{{n}_{j}}+1}})}_{{{n}_{j}}}}$ such that

Equation (17)

For the sake of notation simplicity we denote for the remainder of the proof the index ${{n}_{{{j}_{m}}}}+1$ by $m+1$. By (A1) we observe

As all summands in (17) are positive, we have thus and

Equation (18)

Now let us show that ${{k}^{m+1}}$ converges strongly. As the sequence converges weakly, it is enough to show

Equivalently, we can also show ${{{\rm lim} }_{m\to \infty }}\left\|{{k}^{m+1}}-{{k}_{\epsilon }}\right\|{^{2}}=\left\|\bar{k}-{{k}_{\epsilon }}\right\|{^{2}}$. Again due to the weak convergence of ${{k}^{m+1}}$ it is sufficient to prove

Let us assume that

holds. Rewriting (18) yields

Equation (19)

However, since $\mathcal{R}$ is w-lsc, we observe

which is in contradiction to (19). Thus we have shown the convergence of ${{k}^{m+1}}$ to $\bar{k}$ in norm.

The last part of this proof focus on the convergence of the partial subdifferential of J with respect to k.

Since ${{k}^{m+1}}$ solves the sub-minimisation problem (13b), the optimality condition reads as $0\in {{\partial }_{k}}J({{k}^{m+1}},{{f}^{m+1}})$, or equivalently, there exists an element

Equation (20)

such that $\xi _{k}^{m+1}\in \partial \mathcal{R}\left( {{k}^{m+1}} \right)\subset \mathcal{U}$; see corollary 3.3.

Now, on the limit, $0\in {{\partial }_{k}}J(\bar{k},\bar{f})$, means that

holds, i.e. the right hand-side of (20) converges and the limit of the sequence of subderivatives belongs also to the subdifferential set $\partial \mathcal{R}\left( {\bar{k}} \right)$.

The first part of the statement above can be seen by using condition (A2). Whereas the second part is obtained by the assumption that $\mathcal{R}$ is a convex functional, because in this case the Fenchel subdifferential coincides with the limiting subdifferential, which is a strong-weakly closed mapping (see appendix).

The strong convergence for the second variable is obtained as follows.

Proposition 3.8. Let $\{m\}$ be a subsequence of $\mathbb{N}$ such that the (sub-) sequence ${{\{({{k}^{m+1}},{{f}^{m+1}})\}}_{m}}$ generated by AM algorithm (13) satisfies ${{k}^{m+1}}\to \bar{k}$ and ${{f}^{m+1}}\rightharpoonup \bar{f}$. Then there is a subsequence of ${{\{{{f}^{m+1}}\}}_{m}}$ such that ${{f}^{{{m}_{j}}+1}}\to \bar{f}$ and $0\in {{\partial }_{f}}J(\bar{k},\bar{f})$.

Proof. Similarly as the previous theorem, by setting $f=\bar{f}$ at (15) in the proposition 3.6's proof we obtain

As the limes inferior exists, we can in particular extract a subsequence ${{({{k}^{{{m}_{j}}+1}},{{f}^{{{m}_{j}}+1}})}_{{{m}_{j}}}}$ of ${{({{k}^{m+1}},{{f}^{m+1}})}_{m}}$ such that

Since both summands in the limit above are positive and due to (A1), we conclude that

Moreover, as L is a bounded and continuously invertible operator we have

which in combination with the weak convergence of the subsequence gives its strong convergence ${{f}^{{{m}_{j}}+1}}\to \bar{f}$.

The second half of this proof refers to the convergence of the partial subdifferential of J with respect to f and its limit.

Since ${{f}^{m+1}}$ solves the sub-minimisation problem (13a), the optimality condition reads as $0\in {{\partial }_{f}}J({{k}^{m}},{{f}^{m+1}})$. However we are interested on the partial subderivate at the pair $({{k}^{{{m}_{j}}+1}},{{f}^{{{m}_{j}}+1}})$. Namely, with help of corollary 3.3 the subderivative (which is a unique element) $\xi _{f}^{{{m}_{j}}+1}\in {{\partial }_{f}}J({{k}^{{{m}_{j}}+1}},{{f}^{{{m}_{j}}+1}})$ is computed3 as

which may not be necessarily null for each cycle of the AM algorithm (13), otherwise the stoping criteria would be satisfied and nothing would be left to be proven. Therefore we shall prove that it converges towards zero.

So far we have strong convergence of both sequences ${{\left\{ {{k}^{m+1}} \right\}}_{m}}$ and ${{\left\{ {{f}^{m+1}} \right\}}_{m}}$. Additionally, the assumption A implies that both linear operators Ak and $A_{k}^{*}$ are also strongly continuous, therefore

Equation (21)

Our goal is to show that the limit given in (21) is zero. Let's suppose by contradiction that $0\notin {{\partial }_{f}}J(\bar{k},\bar{f})$. Since this set is singleton we conclude that

This means that $\bar{f}$ does not fulfil the normal equation associated to the standard Tikhonov problem

which is a necessary condition to be a minimiser candidate to the underlying functional.

Therefore the functional $J(\bar{k},\cdot )$ for a given fixed $\bar{k}$ does not attain its minimum value at $\bar{f}$ and there is at least one element f such that $J(\bar{k},f)\lt J(\bar{k},\bar{f})$.

Moreover this functional is convex and it has a global solution, here denoted by $\tilde{f}$. By definition

for all $f\in V$.

In particular, since $\bar{f}$ is not a minimiser for $J(\bar{k},\cdot )$, the inequality above is strict,

Equation (22)

On the other hand, from propostion 3.6 it also holds

for all $f\in V$. Setting $f:=\tilde{f}$ in this inequality we get

which leads to a contradiction to (22).

Therefore for $\bar{f}$ the optimality condition holds true, i.e. in the limit the source condition is fulfilled and the limit of the partial subderivative sequence is zero, i.e. $0\in {{\partial }_{f}}J(\bar{k},\bar{f})$, which completes the proof. □

Remark 3.9. One alternative proof would be assuming that the sequence ${{\{{{k}^{m+1}}\}}_{m}}$ fulfils

Equation (23)

More specifically, we have

from the optimality condition, but we would like to show

Subtracting the latter expression from the first one, we get

Note that by assuming the condition (23) the expression above converges to zero and the proof would be complete. Nevertheless we cannot guarantee that subsequent elements of the original sequence will be selected for the subsequence. As an alternative one can verify numerically if the sequence provided from the AM algorithm satisfies this assumption. Moreover, if we restrict the problem to the simple case that the characterising function is known, then the assumption (23) is trivial, the problem becomes the standard Tikhonov regularisation and the theory is carried on.

The forthcoming and most substantial result within this section shows that the limit $(\bar{k},\bar{f})$ of the sequence generated by the AM algorithm is a critical point (pair) of the functional J.

(Main result).

Theorem 3.10 Let $\{m\}$ an index set of $\mathbb{N}$ such that the sequence generated by AM algorithm ${{\{({{k}^{m+1}},{{f}^{m+1}})\}}_{m}}\to (\bar{k},\bar{f})$ and $(\xi _{k}^{m+1},\xi _{f}^{m+1})\rightharpoonup (0,0)$. Then there is subsequence converging towards to a critical point of J, i.e.

Proof. The proposition 3.7 guarantees that ${{k}^{m+1}}\to \bar{k}$ and ${{\xi }_{{{k}^{m+1}}}}\in \partial \mathcal{R}({{k}^{m+1}})$ (or equivalently, $0\in {{\partial }_{k}}J({{k}^{m+1}},{{f}^{m+1}})$) such that $0\in {{\partial }_{k}}J(\bar{k},\bar{f})$. Likewise, proposition 3.8 guarantees that the sequence ${{f}^{m+1}}\to \bar{f}$ and ${{\xi }_{{{f}^{m+1}}}}\in \partial J({{k}^{m+1}},{{f}^{m+1}})$ such that $0\in {{\partial }_{f}}J(\bar{k},\bar{f})$. Combining this with the strong-weakly closedness property of the subderivative (see appendix) and theorem 3.2 we have

on the limit. □

4. Numerical experiments

In the previous sections we proposed an algorithm to minimise the functional J over two variables. Here we want to discuss the practical realisation of the algorithm, which has been implemented in MATLAB.

For our test computations we choose $\mathcal{R}$ to be the weighted lp norm of the coefficients of the characterising function k with respect to an orthonormal basis ${{\{{{\phi }_{\lambda }}\}}_{\lambda }}$ of $\mathcal{U}$, so

Equation (24)

where ${{k}_{\lambda }}=|\left\langle k\;,\;{{\phi }_{\lambda }} \right\rangle |$. For all possible choices of p it is well known that the choice p = 1 promotes sparsity [8], in the sense that the minimiser of the related Tikhonov functional has only few nonzero coefficients with respect to the underlying bases. We are particularly interested in wavelets bases, as many signals (1D) or images (2D) exhibit piecewise smooth behaviour punctuated by transients.

One cycle of the alternating minimisation problem (13) consists of two steps, where each step consists of the minimisation of a linear and convex functional. Firstly, solving (13a) we fix kn and find the solution ${{f}^{n+1}}$ through, e.g. a conjugate gradient method. Secondly, solving (13b) we fix ${{f}^{n+1}}$ from the previous step and solve the shrinkage minimisation problem described on [8] and get ${{k}^{n+1}}$.

We test the performance of our algorithm for a 2D convolution problem

More precisely the image ${{f}^{{\rm true}}}$ is represented numerically as a matrix of size $256\times 256$, using the command imread to read the original JPEG image4 file composed by three levels of grey. The blurring kernel k0 is described by a Gaussian function

evaluated in the mesh $[-\frac{\pi }{2},\frac{\pi }{2}]\times [-\frac{\pi }{2},\frac{\pi }{2}]$, where a is the amplitude constant, $c\in \mathbb{R}$ and x0 and y0 define the centre, which in our case is located in the upper-left corner. Finally the convolution operator is evaluated using the fast Fourier transform in 2D and it creates a shifted and blurred image as seen in the figure 1.

Figure 1.

Figure 1. From left to right: true image ${{f}^{{\rm true}}}$, blurring Gaussian kernel k0 and convolved data g0.

Standard image High-resolution image

Numerical experiments are performed from given measurements in order to reconstruct both the function and the kernel. An example of the initial noisy data and noisy kernel is illustrated on figure 2, where we added 8% relative white noise.

Figure 2.

Figure 2. Measurements: noisy kernel (left) and noisy data (right), both with 8% relative white noise error.

Standard image High-resolution image

The figure 3 illustrates the significant improvement from the initial given noisy data (see figure 2 (right) with 8% relative noise, ${\rm SNR}=4.199$) compared to the one obtained from the dbl-RTLS solution. For comparison, we also give a reconstruction result obtained by using Tikhonov regularisation applied to the linear convolution problem where the noisy kernel is fixed. The reconstruction result shows that our approach which considers both the function and the kernel as variable leads to a better reconstruction. However, this effect becomes less prominent when the noise becomes considerably small.

Figure 3.

Figure 3. Reconstruction with standard Tikhonov using the blurred kernel (left) and reconstruction from the dbl-RTLS method (right).

Standard image High-resolution image

The numerical results are given in figure 4, which displays in each row three graphics: the approximated image, the reconstructed kernel and its convolution. It plots a collection of numerical solutions computed from four samples with 8%, 4%, 2% and 1% relative error (RE) on both measurements, respectively in each row from top to bottom. Moreover, we compare the numerical reconstruction with the true image and kernel; the errors in norm are displayed in the table 1. Either numerically or visually one can conclude that dbl-RTLS is indeed a regularisation method, since its reconstruction and computed data improve as the noise level decreases.

Figure 4.

Figure 4. From left to right columns: deconvolution solution fn, the reconstruction of the characterising function kn and the attained data gn. From the top to bottom each row is the solution given by the AM algorithm initiated with 8%, 4%, 2% and 1% relative error for both ${{g}_{\delta }}$ and ${{k}_{\epsilon }}$.

Standard image High-resolution image

Table 1.  Error with 2-norm and respective SNR (signal-to-noise ratio).

RE $({{k}_{\epsilon }})$ RE $({{g}_{\delta }})$ $\left\|{{k}_{n}}-\bar{k}\right\|{_{2}}$ $\left\|{{f}_{n}}-\bar{f}\right\|{_{2}}$ SNR fn SNR kn β $\alpha $
8% 8% 3.6438e-01 1.7311e-01 8.6276 10.562 0.4525 0.1246
4% 4% 2.4185e-01 1.5036e-01 12.116 12.272 0.2262 0.0784
2% 2% 2.1545e-01 1.3648e-01 13.099 13.129 0.1131 0.0493
1% 1% 1.6754e-01 1.2596e-01 15.190 13.687 0.0565 0.0310

The pair of regularisation parameters $(\alpha ,\beta )$ was chosen as

where ${{c}_{1}},{{c}_{2}}$, $0\lt \mu ,\nu \leqslant 1$ have been picked heuristically. The $M\times N$ matrix represents the underlying function; in our numerical example $M=N=256$.

Depending on the noise level, up to 5 AM cycles has been carried out. Each of the cycles has been stopped whenever the related norms of the computed updates were below the threshold 1e-4.

Acknowledgments

The authors would like to thank Dr E Resmerita for providing valuable references and helpful comments. The research was funded by the Austrian Science Fund (FWF): W1214-N15, project DK8.

Appendix

The most common concept of subderivative is addressed to convex functions. It was introduced by Fenchel, Moreau and Rockafellar in early 1960 s, but it became popular after [15]. The Fenchel subdifferential of a convex function $\varphi :\mathcal{U}\to \bar{\mathbb{R}}$ (or $[-\infty ,+\infty ]$) at $\bar{u}\in \mathcal{U}$ is defined as the set

This definition was extended also to nonconvex functions by Clarke in 1973. It is based on generalised directional derivatives for locally Lipschitz functions in Banach spaces [6]. The Clark subdifferential of $\varphi $ at $\bar{u}$ is defined by

where

is the generalised directional derivative.

We add to this list two more definitions of subdifferentials. As before, for a set-valued mapping $G:\mathcal{U}\;\rightrightarrows \;{{\mathcal{U}}^{*}}$ between a Banach space $\mathcal{U}$ and its topological dual ${{\mathcal{U}}^{*}}$, the set

denotes the sequential Painlevé-Kuratowski upper/outer limit of a set-valued mapping. Given a lower semi-continuous function $\varphi $, the $\varepsilon $-Fréchet subdifferential of $\varphi $ at $\bar{u}$ is defined by

If $|\varphi (\bar{u})|=\infty $ then ${{\hat{\partial }}_{\varepsilon }}\varphi \left( {\bar{u}} \right)=\varnothing $. When $\varepsilon =0$ the set ${{\hat{\partial }}_{0}}\varphi \left( {\bar{u}} \right)$ will be denoted by $\hat{\partial }\varphi \left( {\bar{u}} \right)$.

The limiting subdifferential or Mordukhovich subdifferential of $\varphi $ at $\bar{u}$ is defined as

where the notation $u\mathop{\to }\limits^{\varphi }\bar{u}$ means $u\to \bar{u}$ with $\varphi (u)\to \varphi (\bar{u})$. This subdifferential corresponds to the collection of weak-star sequential limiting points of the so-called $\varepsilon $-Fréchet subdifferential.

In [7], the following inclusion property between the sets

is shown. The set of subgradients $\hat{\partial }\varphi \left( {\bar{u}} \right)$ may be nonconvex, whereas the Clark subdifferential is always a nonempty convex subset of ${{\mathcal{U}}^{*}}$ whenever $\bar{u}\in {\rm dom}\;\varphi $. It is important to note that the subdifferential definitions generate the same set if the function is convex [5].

Finally we list another property needed to prove convergence results: the concept of strong-weak$^{*}$ closeness (also called ${\rm s}{{{\rm w}}^{*}}$-closed) property of the subdifferential mapping's graph.

Given the subdifferential $\partial \varphi $ of a proper lower semi-continuous function $\varphi $, saying its graph is sw $^{*}$ -closed means whenever $({{u}^{n}},{{\zeta }^{n}})\in {\rm Gph}\;\partial \varphi $ converges in the ${\rm s}{{{\rm w}}^{*}}$-topology to $(\bar{u},\bar{\zeta })$ it ${\rm implies}(\bar{u},\bar{\zeta })\in {\rm Gph}\;\partial \varphi $. In other words, if ${{u}^{n}}\to \bar{u}$ and ${{\zeta }^{n}}\mathop{\rightharpoonup }\limits^{*}\bar{\zeta }$ with ${{\zeta }^{n}}\in \partial \varphi \left( {{u}^{n}} \right)$ then $\bar{\zeta }\in \partial \varphi \left( {\bar{u}} \right)$.

The subdifferential is indeed a ${\rm s}{{{\rm w}}^{*}}$-closed set-value mapping, see for instance [6, proposition 2.1.5] or [9, corollary 5.1]. Moreover, this result holds true for any maximal monotone point-to-set mapping and not only for the subdifferential set-value mapping case; see [4, chapter 4].

For more details on the different types of subdifferential and its properties we refer to [6, 9, 13, 15] and references therein.

Footnotes

  • For sake of notation we continue to denote the subsequence's indices by $m+1$ instead of ${{m}_{j}}+1$.

  • DK Computational Mathematics' logo from JKU Linz.

Please wait… references are loading.
10.1088/0266-5611/31/7/075004