Abstract
The total least squares (TLS) method is a successful approach for linear problems if both the right-hand side and the operator are contaminated by some noise. For ill-posed problems, a regularisation strategy has to be considered to stabilise the computed solution. Recently a double regularised TLS method was proposed within an infinite dimensional setup and it reconstructs both function and operator, reflected on the bilinear forms Our main focuses are on the design and the implementation of an algorithm with particular emphasis on alternating minimisation strategy, for solving not only the double regularised TLS problem, but a vast class of optimisation problems: on the minimisation of a bilinear functional of two variables.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
In [2], the authors described a new two-parameter regularisation scheme for solving an ill-posed operator equation. The task consists of the inversion of a linear operator defined between Hilbert spaces
In contrast to standard inverse problems, where the task is to solve (1) from given noisy data, a more realistic setup is considered where both data and operator are not known exactly. For the reconstruction, a cost functional with two penalisation terms based on the TLS (total least squares) technique is used.
This approach presented in [2] focuses on linear operators that can be characterised by a function, as it is, e.g. the case for linear integral operators, where the kernel function determines the behaviour of the operator. Moreover, it is assumed that the noise in the operator is due to an incorrect characterising function. A penalty term is not only used to stabilise the reconstruction of the unknown solution, as it is the case in [10–12], but also to stabilise the unknown operator. As a drawback, the regularisation scheme becomes nonlinear even for linear equations. However, the potential advantage is that not only the unknown solution is reconstructed, but also a suitable characterising function and thus the governing operator describing the underlying data. Additionally, convergence rates for the reconstruction of both solution and operator have been derived.
The double regularised total least squares (dbl-RTLS) approach allow us to treat the problem in the framework of Tikhonov regularisation rather than as a constraint minimisation problem. More precisely, the regularised solution is obtained by minimising a nonlinear, nonconvex and possibly non-differentiable functional over two variables, which is computationally not always straightforward. Thus the goal of this paper is the development of an efficient and convergent numerical scheme for the minimisation of the Tikhonov-type functional for the dbl-RTLS approach.
The rest of paper is organised as follows: in section 2 we formulate the underlying problem and give a short summary of the dbl-RTLS method. Section 3 is dedicated to the development of an algorithm based on an alternating minimisation strategy, as well as its convergence properties. In section 4, numerical results for the proposed algorithm are provided and the efficiency of the method is discussed. For the convenience of the reader in appendix we display important concepts and fundamental results used throughout this article.
2. Problem formulation and the dbl-RTLS method
As mentioned above, we aim at the inversion of the linear operator equation (1) from noisy data and an incorrect operator . Additionally we assume that the operators , where and are Hilbert spaces, can be characterised by functions , also a Hilbert space. To be more specific, we consider operators
i.e. , where B is a bilinear operator
fulfilling, for some , the inequality
From (2) follows immediately
Associated to the bilinear operator B, we also define the linear operator
i.e. .
From now on, let us identify A0 with and with . From (3) we deduce immediately
i.e. the operator error norm is controlled by the error norm of the characterising functions. Now we can formulate our problem as follows:
Please note that the problem with explicitly known k0 (or the operator A0) is often ill-posed and needs regularisation for a stable inversion. Therefore we will also propose a regularising scheme for the problem (5a)–(5c).
Due to our assumptions on the structure of the operator A0, the inverse problem of identifying the function from noisy measurements and an inexact operator can now be rewritten as the task of solving the inverse problem find f s.t.
from noisy measurements fulfilling
and
In most applications, the 'inversion' of B will be ill-posed (e.g. if B is defined via a Fredholm integral operator), and a regularisation strategy is needed for a stable solution of the problem (6).
For the solution of (6) from given data fulfilling (7), we use the dbl-RTLS method proposed in [2], where the approximations to the solutions are computed as
where
and
Here, and are the regularisation parameters which have to be chosen properly, is a scaling parameter (arbitrary but fixed), L is a bounded linear and continuously invertible operator and is a proper, convex and weakly lower semi-continuous functional. The functional is composed as the sum of two terms: one which measures the discrepancy of data and operator, and one which promotes stability. The functional is a data-fidelity term based on the TLS technique, whereas the functional acts as a penalty term which stabilises the inversion with respect to the pair (k, f). As a consequence, we have two regularisation parameters, which also occurs in double regularisation, see, e.g. [17].
The domain of the functional can be extended over by setting whenever . Then is proper, convex and weak lower semi-continuous functional in .
It has been shown that the sequence of the pair of solutions of (8) converges to a minimum-norm solution when , i.e. it is a regularisation method (see [2, theorem 4.5]). However, the task of finding minimisers of (8) has not been addressed properly, which will be done in the following sections.
3. An algorithm for the minimisation of the dbl-RTLS functional
In this section, we will formulate the first-order necessary condition for critical points of the functional , which requires in particular the derivative of the bilinear operator B. The core of this section is to design an algorithm to minimise , which is not a trivial task, as the functional is most likely nonconvex and nonlinear.
3.1. Optimality condition
It is well known that the study of local behaviour of nonsmooth functions can be achieved by the concept of subdifferentiality which replaces the classical derivative at non-differentiable points.
The first-order necessary condition based on subdifferentiability is stated as the following: if minimises the functional then
We denote the set of all subderivatives of the functional at (k, f) by and we name it the subdifferential of at (k, f). For a quick revision on subdifferentiability we refer to the apppendix.
The first result gives us the derivative of a bilinear operator B.
Lemma 3.1. Let B be a bilinear operator and assume that (2) holds. Then the Fréchet derivative of B at is given by
Moreover, the derivative is Lipschitz continuous with constant .
Since B is bilinear, we have
and we observe : As B fulfills (2), we have
which converges to zero as .
We further observe
which implies
Using the inequality we get
and thus
□
Note that the adjoint operator of the Frechét derivative exists and is a bounded linear operator whenever both and are Hilbert spaces.
In order to analyse the optimality condition (9) we shall compute the subdifferential of a functional over two variables. As pointed out in [6, proposition 2.3.15] for a general function h the set-valued mapping the set and the product set are not necessarily contained in each other. Here, denotes the partial subgradient with respect to xi for . However this is not the case for the functional we are interested in as will be shown in the following theorem.
Theorem 3.2. Let be a functional with the structure
where Q is a (nonlinear) differentiable term and , are proper convex functions, and . Then
Proof. In general the subdifferential of a sum of functions does not equal the sum of its subdifferentials. However, if Q is differentiable, and ψ are convex some inclusions and even equalities hold true (combining [6, proposition 2.3.3; corollary 3; proposition 2.3.6]), as for instance,
Since Q is differentiable, calling the previous results, the (partial) subderivative is unique [6], proposition 2.3.15] and therefore
Note that the subderivative of the sum of two separable convex functionals satisfies
see [18, corollary 2.4.5].
Altogether, we can compute the subderivative as follows
The last implication of this theorem,
follows straightforward by the definition of partial subderivative and (11). □
Please note that the above proof holds for all definitions of subdifferentials introduced in the appendix, as for convex functionals all the definitions are equivalent, and for differentiable (possibly nonlinear) terms the subdifferential is a singleton and the subderivative equals the derivative. Based on theorem 3.2 we can now calculate the derivative of the functional which is the gist for building up the upcoming algorithm; please give heed to the structure of (10) and the proposed functional :
Corollary 3.3. Let the functional defined in (8), then
where .
Proof. The result follows straightforward from lemma 3.1 and theorem 3.2. Observe that the sum is well-defined in the Hilbert space , since the subderivative is also an element of .□
Up to now, we did not specify the functional , it is only required to be convex and lower semi-continuous. We are particularly interested in, e.g. the Lp norm or the weighted ℓp norm, denoted by . Its subdifferential is given in section 4. An easy way to compute the subderivatives of functionals with a specific structure is given by the following lemma.
Lemma 3.4. [(3, lemma 4.4)] Let where Ω is a σ-finite measure space. Let be defined by
where is a convex function. Then is an element of if and only if for almost every (with the identification ).
3.2. An alternating minimisation algorithm
Coordinate descent methods are based on the idea that the minimisation of a multivariable function can be achieved by minimising it along one direction at a time. It is a simple and surprisingly efficient technique. The coordinates can be chosen arbitrarily with any permutation, but one can also replace them by block coordinates (for more details see [16] and references therein). This method is closely related to coordinate gradient descent (CGD), Gauss–Seidel and SOR methods, which was studied previously by several authors and described in various optimisation books, e.g. [1, 14]. In the unconstrained setting the method is called alternating minimisation (AM) when the variables are split into two blocks.
The computation of a solution of dbl-RTLS is not straightforward, as determining the minimum of the functional (8) with respect to both parameters is a nonlinear and nonconvex problem over two variables. Nevertheless we shall overcome this problem by applying some coordinate descent techniques.
In the following we shall denote the dbl-RTLS functional by J instead of , as the parameters of the functionals are kept fix for the minimisation process.
In the AM algorithm, the functional is minimised iteratively with two alternating minimisation steps. Each step minimises the problem over one variable while keeping the second variable fixed:
The notation means we minimise the function J with u fixed, where u can be either k or f. Thus we minimise in each cycle the functionals
and
We highlight some important facts:
- 1.For each subproblem, the considered operators are linear, and the functional is convex. Thus a local minimum is global.
- 2.The first step is a standard quadratic minimisation problem.
First we will show a monotonicity result for the sequence of iterates:
Proposition 3.5. The functional J is non-increasing on the AM iterates,
Proof. The iterates are defined as
and
Therefore,
and
and in particular, setting and ,
and
□
The existence of a minimiser of the functional J has already been proven in [2, theorem 4.2]. The goal of the following results is to prove that the sequence generated by the alternating minimisation algorithm has at least a subsequence which converges towards to a critical point of the functional. Throughout this section, let us make the following assumptions.
- B is strongly continuous, i.e. if then .
- The adjoint of the Fréchet derivative B' of B is strongly continuous, i.e. if then ,
Additionally to the standard norm for the pair
we define the weighted norm for given as
Proposition 3.6. For given regularisation parameters and , the sequence of iterates generated by the AM algorithm has at least a weakly convergent subsequence , and its limit fulfils
for all and for all .
Proof. As the iterates of the AM algorithm can be characterised as the minimisers of a reduced dbl-RTLS functional, see (13a), (13b) we observe
and
Keeping in mind that the operator L is continuously invertible, the first inequality gives
Using the second estimate above and the standard inequality we have
Thus, the sequence is bounded
and by Alaoglu's theorem, it has a weakly convergent subsequence .
Since minimises the functional for a fixed knj, it holds
and thus
Using the fact that J is w-lsc and the strong continuity of B, we observe
Therefore,
The second inequality in (14) is proven similarly: since minimises the functional for fixed it is
which is equivalent to
Again, we observe
and thus
□
In summary, the AM algorithm yields a bounded sequence and hence a weakly convergent subsequence. The next two results extend the convergence on the strong topology, for both and , respectively.
Proposition 3.7. Let be a weakly convergent (sub-) sequence generated by the AM algorithm (13), where and . Then there exists a subsequence of such that and .
Proof. Inequalities (16) in the proposition 3.6's proof reads
for any k. Setting yields in particular
As the limes inferior exists, we can in particular extract a subsequence of such that
For the sake of notation simplicity we denote for the remainder of the proof the index by . By (A1) we observe
As all summands in (17) are positive, we have thus and
Now let us show that converges strongly. As the sequence converges weakly, it is enough to show
Equivalently, we can also show . Again due to the weak convergence of it is sufficient to prove
Let us assume that
holds. Rewriting (18) yields
However, since is w-lsc, we observe
which is in contradiction to (19). Thus we have shown the convergence of to in norm.
The last part of this proof focus on the convergence of the partial subdifferential of J with respect to k.
Since solves the sub-minimisation problem (13b), the optimality condition reads as , or equivalently, there exists an element
such that ; see corollary 3.3.
Now, on the limit, , means that
holds, i.e. the right hand-side of (20) converges and the limit of the sequence of subderivatives belongs also to the subdifferential set .
The first part of the statement above can be seen by using condition (A2). Whereas the second part is obtained by the assumption that is a convex functional, because in this case the Fenchel subdifferential coincides with the limiting subdifferential, which is a strong-weakly closed mapping (see appendix).
The strong convergence for the second variable is obtained as follows.
Proposition 3.8. Let be a subsequence of such that the (sub-) sequence generated by AM algorithm (13) satisfies and . Then there is a subsequence of such that and .
Proof. Similarly as the previous theorem, by setting at (15) in the proposition 3.6's proof we obtain
As the limes inferior exists, we can in particular extract a subsequence of such that
Since both summands in the limit above are positive and due to (A1), we conclude that
Moreover, as L is a bounded and continuously invertible operator we have
which in combination with the weak convergence of the subsequence gives its strong convergence .
The second half of this proof refers to the convergence of the partial subdifferential of J with respect to f and its limit.
Since solves the sub-minimisation problem (13a), the optimality condition reads as . However we are interested on the partial subderivate at the pair . Namely, with help of corollary 3.3 the subderivative (which is a unique element) is computed3 as
which may not be necessarily null for each cycle of the AM algorithm (13), otherwise the stoping criteria would be satisfied and nothing would be left to be proven. Therefore we shall prove that it converges towards zero.
So far we have strong convergence of both sequences and . Additionally, the assumption A implies that both linear operators Ak and are also strongly continuous, therefore
Our goal is to show that the limit given in (21) is zero. Let's suppose by contradiction that . Since this set is singleton we conclude that
This means that does not fulfil the normal equation associated to the standard Tikhonov problem
which is a necessary condition to be a minimiser candidate to the underlying functional.
Therefore the functional for a given fixed does not attain its minimum value at and there is at least one element f such that .
Moreover this functional is convex and it has a global solution, here denoted by . By definition
for all .
In particular, since is not a minimiser for , the inequality above is strict,
On the other hand, from propostion 3.6 it also holds
for all . Setting in this inequality we get
which leads to a contradiction to (22).
Therefore for the optimality condition holds true, i.e. in the limit the source condition is fulfilled and the limit of the partial subderivative sequence is zero, i.e. , which completes the proof. □
Remark 3.9. One alternative proof would be assuming that the sequence fulfils
More specifically, we have
from the optimality condition, but we would like to show
Subtracting the latter expression from the first one, we get
Note that by assuming the condition (23) the expression above converges to zero and the proof would be complete. Nevertheless we cannot guarantee that subsequent elements of the original sequence will be selected for the subsequence. As an alternative one can verify numerically if the sequence provided from the AM algorithm satisfies this assumption. Moreover, if we restrict the problem to the simple case that the characterising function is known, then the assumption (23) is trivial, the problem becomes the standard Tikhonov regularisation and the theory is carried on.
The forthcoming and most substantial result within this section shows that the limit of the sequence generated by the AM algorithm is a critical point (pair) of the functional J.
(Main result).Theorem 3.10 Let an index set of such that the sequence generated by AM algorithm and . Then there is subsequence converging towards to a critical point of J, i.e.
Proof. The proposition 3.7 guarantees that and (or equivalently, ) such that . Likewise, proposition 3.8 guarantees that the sequence and such that . Combining this with the strong-weakly closedness property of the subderivative (see appendix) and theorem 3.2 we have
on the limit. □
4. Numerical experiments
In the previous sections we proposed an algorithm to minimise the functional J over two variables. Here we want to discuss the practical realisation of the algorithm, which has been implemented in MATLAB.
For our test computations we choose to be the weighted lp norm of the coefficients of the characterising function k with respect to an orthonormal basis of , so
where . For all possible choices of p it is well known that the choice p = 1 promotes sparsity [8], in the sense that the minimiser of the related Tikhonov functional has only few nonzero coefficients with respect to the underlying bases. We are particularly interested in wavelets bases, as many signals (1D) or images (2D) exhibit piecewise smooth behaviour punctuated by transients.
One cycle of the alternating minimisation problem (13) consists of two steps, where each step consists of the minimisation of a linear and convex functional. Firstly, solving (13a) we fix kn and find the solution through, e.g. a conjugate gradient method. Secondly, solving (13b) we fix from the previous step and solve the shrinkage minimisation problem described on [8] and get .
We test the performance of our algorithm for a 2D convolution problem
More precisely the image is represented numerically as a matrix of size , using the command imread to read the original JPEG image4 file composed by three levels of grey. The blurring kernel k0 is described by a Gaussian function
evaluated in the mesh , where a is the amplitude constant, and x0 and y0 define the centre, which in our case is located in the upper-left corner. Finally the convolution operator is evaluated using the fast Fourier transform in 2D and it creates a shifted and blurred image as seen in the figure 1.
Numerical experiments are performed from given measurements in order to reconstruct both the function and the kernel. An example of the initial noisy data and noisy kernel is illustrated on figure 2, where we added 8% relative white noise.
Download figure:
Standard image High-resolution imageThe figure 3 illustrates the significant improvement from the initial given noisy data (see figure 2 (right) with 8% relative noise, ) compared to the one obtained from the dbl-RTLS solution. For comparison, we also give a reconstruction result obtained by using Tikhonov regularisation applied to the linear convolution problem where the noisy kernel is fixed. The reconstruction result shows that our approach which considers both the function and the kernel as variable leads to a better reconstruction. However, this effect becomes less prominent when the noise becomes considerably small.
Download figure:
Standard image High-resolution imageThe numerical results are given in figure 4, which displays in each row three graphics: the approximated image, the reconstructed kernel and its convolution. It plots a collection of numerical solutions computed from four samples with 8%, 4%, 2% and 1% relative error (RE) on both measurements, respectively in each row from top to bottom. Moreover, we compare the numerical reconstruction with the true image and kernel; the errors in norm are displayed in the table 1. Either numerically or visually one can conclude that dbl-RTLS is indeed a regularisation method, since its reconstruction and computed data improve as the noise level decreases.
Download figure:
Standard image High-resolution imageTable 1. Error with 2-norm and respective SNR (signal-to-noise ratio).
RE | RE | SNR fn | SNR kn | β | |||
8% | 8% | 3.6438e-01 | 1.7311e-01 | 8.6276 | 10.562 | 0.4525 | 0.1246 |
4% | 4% | 2.4185e-01 | 1.5036e-01 | 12.116 | 12.272 | 0.2262 | 0.0784 |
2% | 2% | 2.1545e-01 | 1.3648e-01 | 13.099 | 13.129 | 0.1131 | 0.0493 |
1% | 1% | 1.6754e-01 | 1.2596e-01 | 15.190 | 13.687 | 0.0565 | 0.0310 |
The pair of regularisation parameters was chosen as
where , have been picked heuristically. The matrix represents the underlying function; in our numerical example .
Depending on the noise level, up to 5 AM cycles has been carried out. Each of the cycles has been stopped whenever the related norms of the computed updates were below the threshold 1e-4.
Acknowledgments
The authors would like to thank Dr E Resmerita for providing valuable references and helpful comments. The research was funded by the Austrian Science Fund (FWF): W1214-N15, project DK8.
Appendix
The most common concept of subderivative is addressed to convex functions. It was introduced by Fenchel, Moreau and Rockafellar in early 1960 s, but it became popular after [15]. The Fenchel subdifferential of a convex function (or ) at is defined as the set
This definition was extended also to nonconvex functions by Clarke in 1973. It is based on generalised directional derivatives for locally Lipschitz functions in Banach spaces [6]. The Clark subdifferential of at is defined by
where
is the generalised directional derivative.
We add to this list two more definitions of subdifferentials. As before, for a set-valued mapping between a Banach space and its topological dual , the set
denotes the sequential Painlevé-Kuratowski upper/outer limit of a set-valued mapping. Given a lower semi-continuous function , the -Fréchet subdifferential of at is defined by
If then . When the set will be denoted by .
The limiting subdifferential or Mordukhovich subdifferential of at is defined as
where the notation means with . This subdifferential corresponds to the collection of weak-star sequential limiting points of the so-called -Fréchet subdifferential.
In [7], the following inclusion property between the sets
is shown. The set of subgradients may be nonconvex, whereas the Clark subdifferential is always a nonempty convex subset of whenever . It is important to note that the subdifferential definitions generate the same set if the function is convex [5].
Finally we list another property needed to prove convergence results: the concept of strong-weak closeness (also called -closed) property of the subdifferential mapping's graph.
Given the subdifferential of a proper lower semi-continuous function , saying its graph is sw -closed means whenever converges in the -topology to it . In other words, if and with then .
The subdifferential is indeed a -closed set-value mapping, see for instance [6, proposition 2.1.5] or [9, corollary 5.1]. Moreover, this result holds true for any maximal monotone point-to-set mapping and not only for the subdifferential set-value mapping case; see [4, chapter 4].
For more details on the different types of subdifferential and its properties we refer to [6, 9, 13, 15] and references therein.
Footnotes
- 3
For sake of notation we continue to denote the subsequence's indices by instead of .
- 4
DK Computational Mathematics' logo from JKU Linz.