A convergent non-negative deconvolution algorithm with Tikhonov regularization

Yueyang Teng; Yaonan Zhang; Hong Li; Yan Kang

doi:10.1088/0266-5611/31/3/035002

1. Introduction

In many optical devices, the process of image blurring can be considered as the result of convolution with a point spread function (PSF), i.e. a blurring kernel [1, 2]. It is assumed that the degraded image Y is in the form $Y=K*X$ , where X is the true image, K is the blurring kernel, and * denotes the convolution operation. The convolution has another expression: Y = PX, where the matrix P consists of PSFs at every pixel. This expression is very flexible since it can even be used when the PSF is spatially variant.

Image deblurring seeks to reverse the convolution process in order to obtain a higher quality image from the blurred image and the measured PSF. Deconvolution is a common example of an inverse problem, and has been widely used in medical imaging, astronomy and biology [3]. Both direct and iterative methods have been developed to solve it. The direct method that is most commonly used is the Wiener filter method. Wiener's method, however, uses the Fourier transform of the PSF as a denominator, which amplifies the high frequencies in the spectrum. Unfortunately, both the details of the image and noise can be found in the high frequencies, leading to increasingly noisy images. In addition, this method is only suitable for spatially invariant PSFs. Besides that, it cannot accommodate a priori constraints on solutions, such as non-negativity. In many images, such as CT and MRI images, negative values make no sense [4]. In this paper, we always assume all components of $X,Y$ and P to be non-negative.

Recently, iterative deconvolution methods have attracted more attention due to the increased ability to model the blurring physics. This may avoid either noise amplification or the inversion of a big matrix. As noted in [5, 6], a simple way to approach the deconvolution problem is to find the least squares (LS) estimate below:

$\begin{eqnarray}&&\mathop{{\rm min} }\limits_{X\geqslant 0}F(X)=\parallel PX-Y{{\parallel }^{2}},\end{eqnarray} \tag{ 1 }$

where $\parallel \cdot \parallel$ denotes the Euclidean distance.

It is well-known that the problem of restoring the original image from the noisy and degraded version is an ill-posed inverse problem: small perturbations in the data may result in an unacceptable result [7]. A general method is to introduce a priori knowledge to constrain the solution space, which can be expressed as a regularization (or penalization) of the restored image to reflect information on the kinds of acceptable images. The Tikhonov regularization method [8, 9] is a popular method that generally leads to a unique solution. Then, instead of (1), one can get a deblurred image by solving the following minimization:

$\begin{eqnarray}&&\mathop{{\rm min} }\limits_{X\geqslant 0}\;\Phi (X)=F(X)+\beta G(X),\end{eqnarray} \tag{ 2 }$

where $G(X)=\parallel LX{{\parallel }^{2}}$ is a regularization and $\beta \geqslant 0$ serves as a penalty parameter. It is worth noting that we must allow negative values to exist in the matrix L. If ${{P}^{{\rm T}}}P+\beta {{L}^{{\rm T}}}L$ is non-singular, the cost function is strictly convex and the optimization problem has a unique solution.

A viable method for the minimization of (2) is the gradient-based method. The steepest descent method is perhaps the simplest technique to implement; it takes the negative gradient as the descent direction:

$\begin{eqnarray}&&{{X}^{t+1}}={{X}^{t}}-\alpha (t)\nabla \Phi ({{X}^{t}}),\;{{X}^{t+1}}={\rm max} ({{X}^{t+1}},0),\end{eqnarray} \tag{ 3 }$

where the superscript t denotes the tth iteration and $\alpha (t)$ is the step size. Of course, we must truncate the negative pixel values.

Let $\alpha (t)=\alpha$ be a constant, where $0\lt \alpha \lt 2/\parallel {{\nabla }^{2}}\Phi ({{X}^{t}})\parallel$ ( $\parallel \cdot \parallel$ denotes the maximum eigenvalue); then (3) becomes the projected Landweber method [10] using

$\begin{eqnarray}&&{{X}^{t+1}}={{X}^{t}}-\alpha \nabla \Phi ({{X}^{t}}),\;{{X}^{t+1}}={\rm max} ({{X}^{t+1}},0).\end{eqnarray} \tag{ 4 }$

In addition, Lanteri et al [11, 12] provided a general multiplicative algorithm, which is also a gradient-based method. Let ${{a}_{j}}({{X}^{t}})$ be a positive function with positive values for any $X_{j}^{t}\gt 0$ ; then $-{\rm diag}[X_{j}^{t}{{a}_{j}}({{X}^{t}})]\nabla \Phi ({{X}^{t}})$ is also a descent direction. So the algorithm can be written in modified gradient form as

$\begin{eqnarray}&&X_{j}^{t+1}=X_{j}^{t}-X_{j}^{t}{{a}_{j}}({{X}^{t}}){{\nabla }_{j}}\Phi ({{X}^{t}}).\end{eqnarray} \tag{ 5 }$

Again, let ${{U}_{j}}({{X}^{t}})$ and ${{V}_{j}}({{X}^{t}})$ be two positive functions for any $X_{j}^{t}\gt 0$ , which satisfy $-{{\nabla }_{j}}\Phi ({{X}^{t}})={{U}_{j}}({{X}^{t}})-{{V}_{j}}({{X}^{t}})$ . Taking ${{a}_{j}}({{X}^{t}})=1/{{V}_{j}}({{X}^{t}})$ , Lanteri et al then obtained a multiplicative update:

$\begin{eqnarray}&&X_{j}^{t+1}=X_{j}^{t}\frac{{{U}_{j}}({{X}^{t}})}{{{V}_{j}}({{X}^{t}})}.\end{eqnarray} \tag{ 6 }$

Take an example: if $L=E-A$ , where E is the unit matrix and A is a non-negative matrix, then the update [12] is given as

$\begin{eqnarray}&&X_{j}^{t+1}=X_{j}^{t}\frac{{{({{P}^{{\rm T}}}Y)}_{j}}+\beta {{[({{A}^{{\rm T}}}+A){{X}^{t}}]}_{j}}}{{{({{P}^{{\rm T}}}P{{X}^{t}})}_{j}}+\beta {{[(E+{{A}^{{\rm T}}}A){{X}^{t}}]}_{j}}}.\end{eqnarray} \tag{ 7 }$

As noted in their article, the algorithm convergence was not provided from theory. Instead, they gave some numerical experiments to demonstrate it.

Rather than using the LS estimate, Richardson and Lucy (RL) [13, 14] minimized the Kullback–Leibler divergence between Y and $PX$ in order to construct an optimization model. The RL algorithm is based on a very different optimization model. The RL algorithm is generally called the ML-EM (maximum likelihood expectation maximization) in medical image reconstruction literature [15]. Besides the Tikhonov regularization, the TV (total variation) regularization as proposed by [16, 17] is also an effective method.

In this paper, we consider a specific application of the surrogate-based methods [18] to a specific kind of Tikhonov regularization of the LS method with non-negativity constraint. We derive a multiplicative update rule for iteratively and monotonically (in the sense of decreasing cost function) solving the restoration model, similar to the EM algorithm [15]. The proposed algorithm can be seen as a special example of Lanteri's method with a suitable choice of the functions U and V in (6). Besides monotonicity, the new algorithm also enjoys automatic satisfaction of the non-negativity constraints without the need for an adjustable step size. We give a rigorous proof that our algorithm will converge to a global solution when the cost function is strictly convex. We deliver some computer simulation results on the noisy Lena photo to show the desirable behavior of the proposed algorithm as regards convergence, stability and computational cost compared with those of existing methods.

2. Methodology

As mentioned in many articles [19–21], a surrogate function as defined below is useful in algorithm derivation as well as proof of convergence.

Definition 1. (The surrogate function) Denote as $\psi (X|{{X}^{t}})$ the surrogate function for $\Psi (X)$ at X^t (fixed) if $\psi ({{X}^{t}}|{{X}^{t}})=\Psi ({{X}^{t}})$ and $\psi (X|{{X}^{t}})\geqslant \Psi (X)$ .

It can be concluded that $\Psi (X)$ decreases under the update ${{X}^{t+1}}={{{\rm min} }_{X}}\;\psi (X|{{X}^{t}})$ because

$\begin{eqnarray}&&\Psi ({{X}^{t+1}})\leqslant \psi ({{X}^{t+1}}|{{X}^{t}})\leqslant \psi ({{X}^{t}}|{{X}^{t}})=\Psi ({{X}^{t}}).\end{eqnarray} \tag{ 8 }$

There are two simple and important properties for the surrogate functions constructed, which will be useful for the algorithm derivation.

If $\psi (X|{{X}^{t}})$ is the surrogate function of ${{\psi }_{{\rm mid}}}(X|{{X}^{t}})$ at X^t and ${{\psi }_{{\rm mid}}}(X|{{X}^{t}})$ is that of $\Psi (X)$ at X^t, then $\psi (X|{{X}^{t}})$ is a surrogate function of $\Psi (X)$ at X^t.
If ${{\psi }_{1}}(X|{{X}^{t}})$ and ${{\psi }_{2}}(X|{{X}^{t}})$ are respectively the surrogate functions of ${{\Psi }_{1}}(X)$ and ${{\Psi }_{2}}(X)$ at X^t, then ${{\psi }_{1}}(X|{{X}^{t}})+{{\psi }_{2}}(X|{{X}^{t}})$ is a surrogate function of ${{\Psi }_{1}}(X)+{{\Psi }_{2}}(X)$ at X^t.

On the basis of the above definition, we will construct the surrogate functions for the original cost function F(X) and the Tikhonov regularization G(X).

2.1. The surrogate function for F(X)

We construct a surrogate function $f(X|{{X}^{t}})$ from the convexity of F(X) and the combination coefficients ${{\lambda }_{ij}}={{P}_{ij}}X_{j}^{t}/{{(P{{X}^{t}})}_{i}}$ satisfying ${{\lambda }_{ij}}\geqslant 0$ and $\sum _{j=1}^{N}{{\lambda }_{ij}}=1$ . The surrogate function can now be defined as

$\begin{eqnarray}&&f(X|{{X}^{t}})=\mathop{\sum }\limits_{i=1}^{N}\mathop{\sum }\limits_{j=1}^{N}{{\lambda }_{ij}}{{(\frac{{{P}_{ij}}{{X}_{j}}}{{{\lambda }_{ij}}}-{{Y}_{i}})}^{2}}.\end{eqnarray} \tag{ 9 }$

It can be verified that $f({{X}^{t}}|{{X}^{t}})=F({{X}^{t}})$ . If we consider Jensen's inequality and the convex combination coefficients ${{\lambda }_{ij}}$ , then $f(X|{{X}^{t}})\geqslant F(X)$ is proven by the following inequality:

$\begin{eqnarray}&&\mathop{\sum }\limits_{j=1}^{N}{{\lambda }_{ij}}{{(\frac{{{P}_{ij}}{{X}_{j}}}{{{\lambda }_{ij}}}-{{Y}_{i}})}^{2}}\geqslant {{[{{(PX)}_{i}}-{{Y}_{i}}]}^{2}}.\end{eqnarray} \tag{ 10 }$

The derivation can also be found in [22] and we may obtain ISRA (the Image Space Reconstruction Algorithm) by minimizing $f(X|{{X}^{t}})$ .

2.2. The surrogate function for G(X)

Since there may be negative values in the matrix L, it is difficult to directly construct a surrogate function as was done above. Some previous works, such as [23], are unable to solve the problem because they fail to guarantee non-negativity during iterations [24]. This paper will give a novel method for solving the problem by using a middle surrogate function. We rewrite the matrix as $L=\bar{L}-\hat{L}$ , where $\bar{L}$ and $\hat{L}$ are two non-negative matrices in which all of the elements are equal to or greater than zero. Next we construct a surrogate function for G(X) at X^t as below:

$\begin{eqnarray}&&{{g}_{{\rm mid}}}(X|{{X}^{t}})={{\bar{g}}_{{\rm mid}}}(X|{{X}^{t}})+{{\hat{g}}_{{\rm mid}}}(X|{{X}^{t}}),\end{eqnarray} \tag{ 11 }$

where

$\begin{eqnarray}&&{{\bar{g}}_{{\rm mid}}}(X|{{X}^{t}})=\frac{1}{2}\parallel 2\bar{L}X-\bar{L}{{X}^{t}}-\hat{L}{{X}^{t}}{{\parallel }^{2}},\end{eqnarray} \tag{ 12 }$

$\begin{eqnarray}&&{{\hat{g}}_{{\rm mid}}}(X|{{X}^{t}})=\frac{1}{2}\parallel 2\hat{L}X-\bar{L}{{X}^{t}}-\hat{L}{{X}^{t}}{{\parallel }^{2}}.\end{eqnarray} \tag{ 13 }$

It can be verified that ${{g}_{{\rm mid}}}({{X}^{t}}|{{X}^{t}})=G({{X}^{t}})$ . By the convexity of G(X), we view the two coefficients with value $1/2$ in (12) and (13) as the combination coefficients. It is then shown that

$\begin{eqnarray}\begin{array}{rcl} {{g}_{{\rm mid}}}(X|{{X}^{t}}) & = & \frac{1}{2}\parallel 2\bar{L}X-\bar{L}{{X}^{t}}-\hat{L}{{X}^{t}}{{\parallel }^{2}} \\ {} & {} & +\frac{1}{2}\parallel -2\hat{L}X+\bar{L}{{X}^{t}}+\hat{L}{{X}^{t}}{{\parallel }^{2}}\geqslant G(X). \\ \end{array}\end{eqnarray} \tag{ 14 }$

Following the same process, we can construct surrogate functions for ${{\bar{g}}_{{\rm mid}}}(X|{{X}^{t}})$ and ${{\hat{g}}_{{\rm mid}}}(X|{{X}^{t}})$ . Let

$\begin{eqnarray}&&{{\bar{\lambda }}_{ij}}={{\bar{L}}_{ij}}X_{j}^{t}/{{(\bar{L}{{X}^{t}})}_{i}}\;{\rm and}\;{{\hat{\lambda }}_{ij}}={{\hat{L}}_{ij}}X_{j}^{t}/{{(\hat{L}{{X}^{t}})}_{i}};\end{eqnarray} \tag{ 15 }$

then

$\begin{eqnarray}&&\bar{g}(X|{{X}^{t}})=\frac{1}{2}\mathop{\sum }\limits_{i=1}^{N}\mathop{\sum }\limits_{j=1}^{N}{{\bar{\lambda }}_{ij}}{{[\frac{2{{{\bar{L}}}_{ij}}{{X}_{j}}}{{{{\bar{\lambda }}}_{ij}}}-{{(\bar{L}{{X}^{t}})}_{i}}-{{(\hat{L}{{X}^{t}})}_{i}}]}^{2}},\end{eqnarray} \tag{ 16 }$

$\begin{eqnarray}&&\hat{g}(X|{{X}^{t}})=\frac{1}{2}\mathop{\sum }\limits_{i=1}^{N}\mathop{\sum }\limits_{j=1}^{N}{{\hat{\lambda }}_{ij}}{{[\frac{2{{{\hat{L}}}_{ij}}{{X}_{j}}}{{{{\hat{\lambda }}}_{ij}}}-{{(\bar{L}{{X}^{t}})}_{i}}-{{(\hat{L}{{X}^{t}})}_{i}}]}^{2}}.\end{eqnarray} \tag{ 17 }$

Combining (16) and (17) yields the surrogate function for G(X) at X^t:

$\begin{eqnarray}&&g(X|{{X}^{t}})=\bar{g}(X|{{X}^{t}})+\hat{g}(X|{{X}^{t}}).\end{eqnarray} \tag{ 18 }$

2.3. The multiplicative update rule

We minimize $\phi (X|{{X}^{t}})=f(X|{{X}^{t}})+\beta g(X|{{X}^{t}})$ to obtain a new iteration. Taking the partial derivatives for $f(X|{{X}^{t}})$ , $\bar{g}(X|{{X}^{t}})$ and $\hat{g}(X|{{X}^{t}})$ leads to the following equations:

$\begin{eqnarray}&&\frac{\partial f(X|{{X}^{t}})}{\partial {{X}_{j}}}=2\frac{{{({{P}^{{\rm T}}}P{{X}^{t}})}_{j}}}{X_{j}^{t}}{{X}_{j}}-2{{({{P}^{{\rm T}}}Y)}_{j}},\end{eqnarray} \tag{ 19 }$

$\begin{eqnarray}&&\frac{\partial \bar{g}(X|{{X}^{t}})}{\partial {{X}_{j}}}=4\frac{{{({{{\bar{L}}}^{{\rm T}}}\bar{L}{{X}^{t}})}_{j}}}{X_{j}^{t}}{{X}_{j}}-2{{[{{\bar{L}}^{{\rm T}}}(\bar{L}+\hat{L}){{X}^{t}}]}_{j}},\end{eqnarray} \tag{ 20 }$

$\begin{eqnarray}&&\frac{\partial \hat{g}(X|{{X}^{t}})}{\partial {{X}_{j}}}=4\frac{{{({{{\hat{L}}}^{{\rm T}}}\hat{L}{{X}^{t}})}_{j}}}{X_{j}^{t}}{{X}_{j}}-2{{[{{\hat{L}}^{{\rm T}}}(\bar{L}+\hat{L}){{X}^{t}}]}_{j}}.\end{eqnarray} \tag{ 21 }$

The one-dimensional equations $\partial \phi (X|{{X}^{t}})/\partial {{X}_{j}}=0$ can now be solved; this leads to the following multiplicative update:

$\begin{eqnarray}&&X_{j}^{t+1}=X_{j}^{t}\frac{{{({{P}^{{\rm T}}}Y)}_{j}}+\beta {{[{{(\bar{L}+\hat{L})}^{{\rm T}}}(\bar{L}+\hat{L}){{X}^{t}}]}_{j}}}{{{({{P}^{{\rm T}}}P{{X}^{t}})}_{j}}+2\beta {{[({{{\bar{L}}}^{{\rm T}}}\bar{L}+{{{\hat{L}}}^{{\rm T}}}\hat{L}){{X}^{t}}]}_{j}}}.\end{eqnarray} \tag{ 22 }$

The update rule is flexible and easy to implement since it results from the current image multiplied by a factor. The update rule has two important properties: (i) the iterations are non-negative if the initial estimate satisfies $X_{j}^{0}\gt 0$ for any j, and (ii) the cost function decreases monotonically with increasing iteration number.

When we analyze the derivation process, we can make the following observation: the key step is replacing the minimization of $\Phi (X)$ by minimizing at each iteration the surrogate function $\phi (X|{{X}^{t}})$ , whose variables are separable. Moreover, $\phi (X|{{X}^{t}})\geqslant \Phi (X)$ and $\phi ({{X}^{t}}|{{X}^{t}})=\Phi ({{X}^{t}})$ ; thus, the minimization of $\phi (X|{{X}^{t}})$ ensures decrease of $\Phi (X)$ . The derivation process can also be explained in terms of an EM algorithm when considering $\phi (X|{{X}^{t}})$ as the negative conditional expectation; its minimization is then equivalent to the maximization of the conditional expectation.

2.4. An example

An example is given to illustrate our method. We consider the following regularization:

$\begin{eqnarray}&&G(X)=\parallel LX{{\parallel }^{2}}=\mathop{\sum }\limits_{j=1}^{N}{{({{X}_{j}}-\mathop{\sum }\limits_{k\in {{\delta }_{j}}}{{\omega }_{jk}}{{X}_{k}})}^{2}},\end{eqnarray} \tag{ 23 }$

where ${{\delta }_{j}}$ denotes a set consisting of twenty-four connected pixels around the jth pixel and the ${{\omega }_{jk}}$ satisfy that ${{\omega }_{jk}}=1/24$ if ${{X}_{k}}\in {{\delta }_{j}}$ , and otherwise ${{\omega }_{jk}}=0$ . The following theorem shows that this regularization leads to a strictly convex cost function.

Theorem 1. If $P1\ne 0$ where 1 denotes a column vector consisting of entries 1, i.e., the blurring result for a uniform image is nonzero, then the Hessian matrix $H={{P}^{{\rm T}}}P+\beta {{L}^{{\rm T}}}L$ is positive definite for $\beta \gt 0$ ; so $\Phi (X)$ is strictly convex.

Inspired by the Fessler's work [25], we give a proof.

Proof. It suffices to prove that ${{X}^{{\rm T}}}HX\ne 0$ if $X\ne 0$ . From (23) we can obtain that ${{X}^{{\rm T}}}{{L}^{{\rm T}}}LX=0$ only if $X=c1$ where some $c\geqslant 0$ . But $c{{1}^{{\rm T}}}{{P}^{{\rm T}}}Pc1={{c}^{2}}\parallel P1{{\parallel }^{2}}\ne 0$ if $c\ne 0$ by assumption. □

Specifying that $E=\bar{L}$ in (22) for convenience, the update rule is formulated as follows:

$\begin{eqnarray}&&X_{j}^{t+1}=X_{j}^{t}\frac{{{({{P}^{{\rm T}}}Y)}_{j}}+\beta {{[{{(E+\hat{L})}^{{\rm T}}}(E+\hat{L}){{X}^{t}}]}_{j}}}{{{({{P}^{{\rm T}}}P{{X}^{t}})}_{j}}+2\beta {{[(E+{{{\hat{L}}}^{{\rm T}}}\hat{L}){{X}^{t}}]}_{j}}}.\end{eqnarray} \tag{ 24 }$

Comparing (7) and (24), it can be seen that Lanteri's and our algorithms have update rules that are similar in form. In fact, there is a close association between them, which is demonstrated in appendix A.

3. Convergence

We will prove that the sequence $\{{{X}^{t}}\}$ generated by (22) converges to a global minimum. In [26], Jacobson et al presented a general convergence proof for the class of majorize–minimize algorithms. This proof may be viewed as a special example. Here, we present a simpler targeted proof.

In theory, a global solution must meet the Kuhn–Tucker (KT) conditions, and vice versa if the cost function is convex. It is easy to see the convexity of $\Phi (X)$ . By Theorem 2.19 in [27], the KT conditions of (2) are as follows:

$\begin{eqnarray}&&\frac{\partial \Phi (X)}{\partial {{X}_{j}}}=0\qquad {\rm if}\;{{X}_{j}}\gt 0,\end{eqnarray} \tag{ 25 }$

$\begin{eqnarray}&&\frac{\partial \Phi (X)}{\partial {{X}_{j}}}\geqslant 0\qquad {\rm if}\;{{X}_{j}}=0.\end{eqnarray} \tag{ 26 }$

Before analyzing the properties of the algorithm, we consider several important and reasonable assumptions.

Assumption 1. For the iteration sequence $\{{{X}^{t}}\}$ , we assume that:

the algorithm starts from a positive image ( ${{X}^{0}}\gt 0$ );
${{({{P}^{{\rm T}}}P)}_{jj}}\gt 0$ for all j;
$\Phi$ is a strictly convex function.

The first assumption forces the iterations to be non-negative. The second condition is reasonable because ${{({{P}^{{\rm T}}}P)}_{jj}}=0$ suggests that ${{P}_{ij}}=0$ for any i. Thus, the equation Y=PX is irrelevant to X_j, and then we can remove X_j from X to get an equivalent model. For the third requirement, one of the main reasons for adding the regularization is to enforce the convexity.

In the following, we will prove global convergence for the proposed algorithm along the lines of [23, 28, 29]. First, we give several useful lemmas.

Lemma 1. The set of accumulation points of a bounded sequence $\{{{Z}^{t}}\}$ with $\{\parallel {{Z}^{t+1}}-{{Z}^{t}}\parallel \}\to 0$ is connected and compact.

Proof. This is Theorem 28.1 from Ostrowski [30]. The reader is asked to see this paper for the proof. □

Lemma 2. The iteration sequence $\{{{X}^{t}}\}$ is bounded.

Proof. We have $\{{{X}^{t}}\}\subset \{X:\Phi (X)\leqslant \Phi ({{X}^{0}})\}\subset \{X:F(X)\leqslant \Phi ({{X}^{0}})\}$ ; therefore, if F has bounded level sets, then $\{{{X}^{t}}\}$ is bounded. The boundedness of the level sets of F can be proven by contradiction. If it does not hold, for a given real value C, there must exist a sequence $\{{{Z}^{t}}\}$ satisfying $\Phi ({{Z}^{t}})\leqslant C$ and an element $\{Z_{j}^{t}\}$ that approaches $+\infty$ . If we consider that ${{P}_{ij}}\geqslant 0$ and $\sum _{i=1}^{N}P_{ij}^{2}\gt 0$ (assumption 1) for all j, then this implies that $\Phi ({{Z}^{t}})\to +\infty$ , which is a contradiction. □

Lemma 3. The sequence $\{\parallel {{X}^{t}}-{{X}^{t+1}}\parallel \}\to 0$ . Furthermore, if a subsequence $\{{{X}^{{{t}_{s}}}}\}\to {{X}^{*}}$ , then $\{{{X}^{{{t}_{s}}+1}}\}\to {{X}^{*}}$ as well.

Proof.

(i)
Take into account that
$\begin{eqnarray}\begin{array}{rcl} \nabla _{jj}^{2}\phi (X|{{X}^{t}}) & = & 2\frac{{{({{P}^{{\rm T}}}P{{X}^{t}})}_{j}}}{X_{j}^{t}}+4\beta [\frac{{{({{{\bar{L}}}^{{\rm T}}}\bar{L}{{X}^{t}})}_{j}}}{X_{j}^{t}}+\frac{{{({{{\hat{L}}}^{{\rm T}}}\hat{L}{{X}^{t}})}_{j}}}{X_{j}^{t}}] \\ {} & {} & \geqslant \;2\frac{{{({{P}^{{\rm T}}}P{{X}^{t}})}_{j}}}{X_{j}^{t}}\geqslant 2{{({{P}^{{\rm T}}}P)}_{jj}}. \\ \end{array}\end{eqnarray} \tag{ 27 }$
Let $\gamma ={{{\rm min} }_{j}}\{{{({{P}^{{\rm T}}}P)}_{jj}}\}$ . Since $\nabla \phi ({{X}^{t+1}}|{{X}^{t}})=0$ , we then have
$\begin{eqnarray}\begin{array}{lcl} {} & {} & \Phi ({{X}^{t}})-\Phi ({{X}^{t+1}})\geqslant \phi ({{X}^{t}}|{{X}^{t}})-\phi ({{X}^{t+1}}|{{X}^{t}})\; \\ {} & = & \frac{1}{2}{{({{X}^{t}}-{{X}^{t+1}})}^{{\rm T}}}{{\nabla }^{2}}\phi ({{X}^{t+1}}|{{X}^{t}})({{X}^{t}}-{{X}^{t+1}}) \\ {} & \geqslant & \gamma \parallel {{X}^{t}}-{{X}^{t+1}}{{\parallel }^{2}}. \\ \end{array}\end{eqnarray} \tag{ 28 }$
Because $\{\Phi ({{X}^{t}})\}$ decreases monotonically and is bounded from below, we then have $\{\Phi ({{X}^{t}})-\Phi ({{X}^{t+1}})\}\to 0$ , so $\{\parallel {{X}^{t}}-{{X}^{t+1}}\parallel \}\to 0$ .
(ii)
By contradiction, if $\{{{X}^{{{t}_{s}}+1}}\}$ diverges, then it must have a convergent subsequence $\{{{X}^{{{t}_{s}}_{s}+1}}\}\to {{X}^{**}}\ne {{X}^{*}}$ . Let ${{\epsilon }_{0}}=\parallel {{X}^{*}}-{{X}^{**}}\parallel \gt 0$ . Consider the two convergent subsequences $\{{{X}^{{{t}_{s}}_{s}}}\}$ and $\{{{X}^{{{t}_{s}}_{s}+1}}\}$ ; then there must be a positive integers T making $\parallel {{X}^{{{t}_{s}}_{s}}}-{{X}^{*}}\parallel \lt {{\epsilon }_{0}}/4$ and $\parallel {{X}^{{{t}_{s}}_{s}+1}}-{{X}^{**}}\parallel \lt {{\epsilon }_{0}}/4$ when ${{t}_{{{s}_{s}}}}\gt T$ . By the triangle inequality, we can obtain that
$\begin{eqnarray}\begin{array}{ll} {} & {} & \parallel {{X}^{{{t}_{s}}_{s}}}-{{X}^{{{t}_{s}}_{s}+1}}\parallel +\parallel {{X}^{{{t}_{s}}_{s}}}-{{X}^{*}}\parallel +\parallel {{X}^{{{t}_{s}}_{s}+1}}-{{X}^{**}}\parallel \geqslant \parallel {{X}^{*}}-{{X}^{**}}\parallel \\ {} & {} & \Longrightarrow \parallel {{X}^{{{t}_{s}}_{s}}}-{{X}^{{{t}_{s}}_{s}+1}}\parallel \gt \frac{{{\epsilon }_{0}}}{2}. \\ \end{array}\end{eqnarray} \tag{ 29 }$
This is a contradiction to $\{\parallel {{X}^{t}}-{{X}^{t+1}}\parallel \}\to 0$ .

□

Lemma 4. We know that at each iteration

$\begin{eqnarray}&&\begin{array}{l} X_{j}^{t+1}=X_{j}^{t}-{{\alpha }_{j}}{{[\nabla \Phi ({{X}^{t}})]}_{j}}, \\ \quad where\;{{\alpha }_{j}}=\frac{X_{j}^{t}}{2{{({{P}^{{\rm T}}}P{{X}^{t}})}_{j}}+4\beta {{[({{{\bar{L}}}^{{\rm T}}}\bar{L}+{{{\hat{L}}}^{{\rm T}}}\hat{L}){{X}^{t}}]}_{j}}}. \\ \end{array}\end{eqnarray} \tag{ 30 }$

The derivation process is omitted in view of its simplicity. As can be seen, the proposed algorithm can be considered as a diagonally rescaled gradient descent, where the rescaling factor ensures non-negativity. We should note the obvious difference between our algorithm and the steepest descent method, because the latter method has the same rescaling factor (i.e. step size) for all components of the gradient.

Given the above, it becomes feasible to discuss the convergence of $\{\Phi ({{X}^{t}})\}$ , which, however, does not automatically imply the convergence of $\{{{X}^{t}}\}$ [28]. By the three theorems below, the global convergence is proven.

Theorem 2. Let $\{{{X}^{{{t}_{s}}}}\}\to {{X}^{*}}$ be a convergent subsequence; then X^* meets the first KT condition (25).

Proof. When $X_{j}^{*}\gt 0$ , by (19)–(21), it can be verified that

$\begin{eqnarray}&&\frac{\partial f(X|{{X}^{*}})}{\partial {{X}_{j}}}{{|}_{{{X}^{*}}}}=2{{[{{P}^{{\rm T}}}(P{{X}^{*}}-Y)]}_{j}},\end{eqnarray} \tag{ 31 }$

$\begin{eqnarray}&&\frac{\partial \bar{g}(X|{{X}^{*}})}{\partial {{X}_{j}}}{{|}_{{{X}^{*}}}}=4{{({{\bar{L}}^{{\rm T}}}\bar{L}{{X}^{*}})}_{j}}-2{{[{{\bar{L}}^{{\rm T}}}(\bar{L}+\hat{L}){{X}^{*}}]}_{j}},\end{eqnarray} \tag{ 32 }$

$\begin{eqnarray}&&\frac{\partial \hat{g}(X|{{X}^{*}})}{\partial {{X}_{j}}}{{|}_{{{X}^{*}}}}=4{{({{\hat{L}}^{{\rm T}}}\hat{L}{{X}^{*}})}_{j}}-2{{[{{\hat{L}}^{{\rm T}}}(\bar{L}+\hat{L}){{X}^{*}}]}_{j}}.\end{eqnarray} \tag{ 33 }$

Then,

$\begin{eqnarray}&&\frac{\partial \phi (X|{{X}^{*}})}{\partial {{X}_{j}}}{{|}_{{{X}^{*}}}}=2{{[{{P}^{{\rm T}}}(P{{X}^{*}}-Y)]}_{j}}\end{eqnarray} \tag{ 34 }$

$\begin{eqnarray}&&+4\beta {{[({{\bar{L}}^{{\rm T}}}\bar{L}+{{\hat{L}}^{{\rm T}}}\hat{L}){{X}^{*}}]}_{j}}-2\beta {{[{{(\bar{L}+\hat{L})}^{{\rm T}}}(\bar{L}+\hat{L}){{X}^{*}}]}_{j}}\end{eqnarray} \tag{ 35 }$

$\begin{eqnarray}&&=\;2{{[{{P}^{{\rm T}}}(P{{X}^{*}}-Y)]}_{j}}+2\beta {{[{{(\bar{L}-\hat{L})}^{{\rm T}}}(\bar{L}-\hat{L}){{X}^{*}}]}_{j}}\end{eqnarray} \tag{ 36 }$

$\begin{eqnarray}&&=\;2{{[{{P}^{{\rm T}}}(P{{X}^{*}}-Y)]}_{j}}+2\beta {{({{L}^{{\rm T}}}L{{X}^{*}})}_{j}}=\frac{\partial \Phi ({{X}^{*}})}{\partial {{X}_{j}}}.\end{eqnarray} \tag{ 37 }$

We consider the following:

$\begin{eqnarray}&&\frac{\partial \phi (X|{{X}^{{{t}_{s}}}})}{\partial {{X}_{j}}}{{|}_{{{X}^{{{t}_{s}}+1}}}}=0.\end{eqnarray} \tag{ 38 }$

Because $\{{{X}^{{{t}_{s}}}}\}\to {{X}^{*}}$ and $\{{{X}^{{{t}_{s}}+1}}\}\to {{X}^{*}}$ (lemma 3), we then have

$\begin{eqnarray}&&\frac{\partial \phi (X|{{X}^{*}})}{\partial {{X}_{j}}}{{|}_{{{X}^{*}}}}=\mathop{{\rm lim} }\limits_{{{t}_{s}}\to +\infty }\frac{\partial \phi (X|X_{s}^{t})}{\partial {{X}_{j}}}{{|}_{{{X}^{{{t}_{s}}+1}}}}=0.\end{eqnarray} \tag{ 39 }$

□

Theorem 3. The whole sequence $\{{{X}^{t}}\}$ converges.

Proof. According to lemma 1, 2 and 3, the set of accumulation points of $\{{{X}^{t}}\}$ is connected and compact. If we can prove that the number of accumulation points is finite, then the desired result follows because a finite set can be connected only if it consists of a single point [29].

To prove the existence of a finite number of accumulation points, we consider any accumulation point X^*. Given an integer set $S=\{1,2,\ldots ,N\}$ where N is the total number of components of X, we then have that ${{S}^{*}}=\{j:X_{j}^{*}=0\}$ is a subset of S. Let ${{\Phi }_{{{S}^{*}}}}$ be the restriction of Φ to the set $\{X:{{X}_{j}}=0,j\in {{S}^{*}}\}$ , which is a strictly convex function of the reduced variables. It follows that ${{\Phi }_{{{S}^{*}}}}$ has a unique minimum (theorem 2: $\partial \Phi ({{X}^{*}})/\partial {{X}_{j}}=0{\rm if}X_{j}^{*}\gt 0$ ). This means that an accumulation point must correspond to a subset of S. The number of subsets of S is finite, so the number of accumulation points is also finite. □

Here, we present a novel proof process. In theorem 2, we prove that every accumulation point meets the first KT condition, by which the full sequence convergence is provided in theorem 3. Naturally, the limit of $\{{{X}^{t}}\}$ satisfies the first KT condition. As below, we will show that the second KT condition is satisfied.

Theorem 4. The limit X^* of $\{{{X}^{t}}\}$ meets the second KT condition (26).

Proof. When $X_{j}^{*}=0$ , by contradiction, we assume that there is an $X_{j}^{*}=0$ satisfying ${{[\nabla \Phi ({{X}^{*}})]}_{j}}\lt 0$ . Since $\{{{X}^{t}}\}\to {{X}^{*}}$ , there exists an $\epsilon \lt 0$ and a positive integer $\mathcal{T}$ such that ${{[\nabla \Phi ({{X}^{t}})]}_{j}}\lt \epsilon$ for $t\gt \mathcal{T}$ ; then,

$\begin{eqnarray}&&X_{j}^{t+1}-X_{j}^{t}=-{{\alpha }_{j}}{{[\nabla \Phi ({{X}^{t}})]}_{j}}\gt -{{\alpha }_{j}}\epsilon \gt 0\;(by\;lemma\;4).\end{eqnarray} \tag{ 40 }$

Thus, we can obtain that $X_{j}^{t+1}\gt X_{j}^{t}$ , which is contradicted by $\{X_{j}^{t}\}\to 0$ . □

4. Simulation results

We compared the performance of the proposed surrogate function method with that of the projected Landweber method (4) and Lanteri's PMLIR (7) for the Lena image with 128 × 128 pixels. For the projected Landweber method, we use $\alpha =1/[\parallel P{{\parallel }_{1}}\parallel P{{\parallel }_{\infty }}+\beta \parallel L{{\parallel }_{1}}\parallel L{{\parallel }_{\infty }}]$ [31], where $\parallel \cdot {{\parallel }_{1}}$ and $\parallel \cdot {{\parallel }_{\infty }}$ denote the 1-norm and $\infty$ -norm of a matrix. We blurred the image by using a uniform 5 × 5 kernel, after which we added Gaussian-distributed white noise to each pixel with different powers. Figure 1 shows the original and noisy images.

**Figure 1.** (a) Lena photo with 128 × 128 pixels, (b) blurred image with a 5 × 5 uniform kernel and additive 5 dBW Gaussian white noise, (c) blurred image with the same kernel and additive 15 dBW Gaussian white noise. Copyright Playboy Enterprises.
Download figure:
Standard image High-resolution image

The experiments were performed on an HP Compaq PC with 3.00 GHz Core i5 CPU and 4 GB memory. The algorithms were implemented in MATLAB 7.0. All of the algorithms were initiated with the same uniform image for a fair comparison. A special Tikhonov regularization as given in (23) was used.

The mean square error (MSE) was used to measure the similarity to the true image. The value of the MSE was calculated by taking the average of the squared differences between the restored pixel values and the non-degraded values, as given below:

$\begin{eqnarray}&&M(t)=\frac{1}{N}\parallel {{X}^{t}}-{{X}^{{\rm True}}}{{\parallel }^{2}}.\end{eqnarray} \tag{ 41 }$

The criterion below was applied for stopping the iterative process:

$\begin{eqnarray}&&\chi (t)=\frac{\parallel {{X}^{t+1}}-{{X}^{t}}{{\parallel }^{2}}}{\parallel {{X}^{t}}{{\parallel }^{2}}}\lt \epsilon ,\end{eqnarray} \tag{ 42 }$

where is a difference tolerance.

In this paper, we find a suitable penalty parameter for the algorithms by comparing the change of the MSE (at the 500th iteration), with respect to β in figure 2. For the case of low noise, the optimal MSE values are obtained at $\beta ={{10}^{-2}}$ for all three methods. For the case of high noise, the optimal penalty parameter is obtained at $\beta ={{10}^{-1}}$ . It is worth noting that the optimal penalty parameter depends on the optimization model, but not the particular algorithm. That is why the three algorithms always have the same optimal penalty parameter.

In addition, in figure 2, PMLIR and the proposed algorithm give almost the same curves: it is difficult to distinguish them from each other with the naked eye. In appendix A, we have concluded that they have similar natures in theory. Now, the idea is further verified from a numerical experiment standpoint. We should also note that when $\beta ={{10}^{-2}}$ , 10⁻¹ and 1, the MSE values are very similar for the three methods. However, when $\beta \lt {{10}^{-2}}$ , the projected Landweber value is rather different from the other two values. We think that the model is becoming ill-posed, with the result that different algorithms do not converge to the same solutions. To conclude, the proposed algorithm always results in the smallest (at least similar) MSE value, as compared with the other algorithms.

Table 1 indicates iteration numbers and running times for meeting the stopping criterion $\epsilon ={{10}^{-7}}$ . We select $\beta ={{10}^{-2}}$ for the case of low noise and $\beta ={{10}^{-1}}$ for the case of high noise. For an accurate time comparison, we execute ten reconstructions for each degraded image and compute the average time costs. It can be seen that, regardless of the noise levels, PMLIR and our method need a similar number of iterations and similar computational times for a stable result, while the Landweber method always requires more. In fact, by comparing the formulations, it can also be found that the three algorithms have similar computational complexities, so the time cost depends only on the iteration number.

Table 1. Iteration number and running time in seconds for meeting the stopping criterion $\epsilon ={{10}^{-7}}$ (iteration/time).

Algorithm	Projected Landweber	PMLIR	Surrogate function
Low noise	205/0.4756	67/0.1541	66/0.1452
High noise	43/0.0998	31/0.0713	33/0.0726

Figure 3 presents the comparison of the MSE curves and cost functions corresponding to table 1. The results further demonstrate the advantage of our method, as well as PMLIR, over the projected Landweber method as regards the convergence rate. The proposed algorithm still shows a performance similar to that of PMLIR due to the similar structure.

**Figure 3.** MSE and cost function versus iteration number corresponding to table 1. For the 5 dBW noisy image, $\beta ={{10}^{-2}}$ ; for the 15 dBW noisy case, $\beta ={{10}^{-1}}$ . We use a log scale both for the x-axis and for the y-axis.
Download figure:
Standard image High-resolution image

Figure 4 shows the corresponding restored results. For the two noise levels, the reconstructed results are rather similar. It should be emphasized that the optimization problem is proven to have a unique global solution. This means that all convergent algorithms will arrive at that solution with sufficient iterations.

5. Conclusion

We present a new algorithm for the LS deconvolution model with Tikhonov regularization. The algorithms derived from our mechanism have several desirable properties, including the monotonic decrease of the cost function, the self-constraining in the feasible region and there being no need for pre-selected step sizes. We provide a rigorous global convergence proof for when the cost function is strictly convex. The simulations show that the proposed algorithm provides results comparable to those from some existing methods.

The derived algorithm inherits a slow convergence speed from the surrogate-based methods. Many efficient methods can be applied to accelerate them, such as that of [32]. An accelerated version of the proposed method will be investigated in the future. Furthermore, in image deblurring, it is usually necessary to design edge-preserving image modes as a prior distribution—such as in the total variation approach and the nonlinear anisotropic diffusion approach. These regularizations, however, will lead to non-differentiable cost functions, for which it is difficult to design a satisfactory convergent algorithm. Future work will focus on these topics.

Acknowledgments

The authors thank the reviewers for their valuable comments. Thanks also go to the National Natural Science Foundation of China (61302013, 61372014, 61201053), the Natural Science Foundation of Liaoning Province of China (201202071) and the Fundamental Research Funds for the Central Universities (N130319003) for their financial support.

Appendix.: Comparison between (7) and (24)

For Lanteri's method, the two positive functions are not unique, obviously. For example, if we are given another positive function ${{b}_{j}}({{X}^{t}})$ , then ${{U}_{j}}({{X}^{t}})+{{b}_{j}}({{X}^{t}})$ and ${{V}_{j}}({{X}^{t}})+{{b}_{j}}({{X}^{t}})$ also satisfy the conditions, and they lead to another update:

$\begin{eqnarray}&&X_{j}^{t+1}=X_{j}^{t}\frac{{{U}_{j}}({{X}^{t}})+{{b}_{j}}({{X}^{t}})}{{{V}_{j}}({{X}^{t}})+{{b}_{j}}({{X}^{t}})}.\end{eqnarray} \tag{ A.1 }$

In fact, (7) and (24) result from different choices of ${{U}_{j}}({{X}^{t}})$ and ${{V}_{j}}({{X}^{t}})$ . For the given regularization $G(X)=\parallel (E-A)X{{\parallel }^{2}}$ , Lanteri et al used

$\begin{eqnarray}&&{{U}_{j}}({{X}^{t}})={{({{P}^{{\rm T}}}Y)}_{j}}+\beta {{[({{A}^{{\rm T}}}+A){{X}^{t}}]}_{j}},\end{eqnarray} \tag{ A.2 }$

$\begin{eqnarray}&&{{V}_{j}}({{X}^{t}})={{({{P}^{{\rm T}}}P{{X}^{t}})}_{j}}+\beta {{[(E+{{A}^{{\rm T}}}A){{X}^{t}}]}_{j}}\end{eqnarray} \tag{ A.3 }$

to obtain (7). We chose

$\begin{eqnarray}&&{{U}_{j}}({{X}^{t}})={{({{P}^{{\rm T}}}Y)}_{j}}+\beta {{[{{(E+A)}^{{\rm T}}}(E+A){{X}^{t}}]}_{j}},\end{eqnarray} \tag{ A.4 }$

$\begin{eqnarray}&&{{V}_{j}}({{X}^{t}})={{({{P}^{{\rm T}}}P{{X}^{t}})}_{j}}+2\beta {{[(E+{{A}^{{\rm T}}}A){{X}^{t}}]}_{j}}\end{eqnarray} \tag{ A.5 }$

to obtain (24). Note that we replace A with $\hat{L}$ in (24) to connect with the context. The equivalence of the two forms can be verified via

$\begin{eqnarray}&&\begin{array}{l} ({{A}^{{\rm T}}}+A)-(E+{{A}^{{\rm T}}}A)=(A+{{A}^{{\rm T}}}+E+{{A}^{{\rm T}}}A)-2(E+{{A}^{{\rm T}}}A) \\ \quad ={{(E+A)}^{{\rm T}}}(E+A)-2(E+{{A}^{{\rm T}}}A). \\ \end{array}\end{eqnarray} \tag{ A.6 }$

From the comparison, our method may be viewed as a special example of Lanteri's method. But this is not all. In this paper, we only provide one way of constructing a surrogate function. In fact, there are many ways to construct surrogate functions, as noted in [24]. In the future, we will work on finding more relations between the two.

A convergent non-negative deconvolution algorithm with Tikhonov regularization

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Dates

Peer review information

Abstract

1. Introduction

2. Methodology

2.1. The surrogate function for F(X)

2.2. The surrogate function for G(X)

2.3. The multiplicative update rule

2.4. An example

3. Convergence

4. Simulation results

5. Conclusion

Acknowledgments

Appendix.: Comparison between (7) and (24)

A convergent non-negative deconvolution algorithm with Tikhonov regularization

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Dates

Peer review information

Abstract

1. Introduction

2. Methodology

2.1. The surrogate function for F(X)

2.2. The surrogate function for G(X)

2.3. The multiplicative update rule

2.4. An example

3. Convergence

4. Simulation results

5. Conclusion

Acknowledgments

Appendix.: Comparison between (7) and (24)