Guaranteed $\varepsilon$ -optimal solutions with the linear optimizer ART3+O

Bram L Gorissen

doi:10.1088/1361-6560/ab0a2e

1. Introduction

A linear optimization problem is a linear objective subject to linear constraints:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq:standardform} \max_x\{c^Tx : Ax\leqslant b, x \geqslant 0\}. \nonumber \end{align} \tag{ 1 }$

It is easy to show that any linear optimization problem can be written in this form, even if it is originally a minimization problem, has equality constraints or allows variables to be negative.

One application of linear optimization is inverse planning in radiation therapy, where the goal is to customize a treatment plan to the patient's anatomy. A 3d grid is superimposed onto a patient for discretization purposes. The cubes of this grid (voxels) that lie in or near the tumor (the planning target volume, PTV) get prescribed a certain dose level, while the other voxels, especially those in critical organs, should get as little dose as possible. In many treatment modalities, the dose in a voxel is linear in the fluence vector $x \in \mathbb{R}^n$ (Bortfeld et al 1990, Lomax 1999, Lessard and Pouliot 2001):

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq:dose} d_i = \sum_j D_{ij} x_j. \nonumber \end{align} \tag{ 2 }$

The coefficients D_ij depend on the patient's anatomy and can be computed analytically or with Monte Carlo simulation. Surprisingly many dose objectives and constraints can be expressed in the framework of linear optimization. Mean and maximum dose (Bahr et al 1968), mean dose above or under a given threshold dose (Shepard et al 1999), and mean dose in the tail (Romeijn et al 2003) can be combined into the same optimization problem. For example, linear optimization can be used to find a treatment plan that minimizes the mean dose in the patient such that the mean dose below 50 Gy in the PTV is no more than 1 Gy, while the mean dose in the 5% region of the PTV with the most dose does not exceed 55 Gy.

Although linear optimization is a versatile modeling tool, the solution time depends greatly on the structure of the problem. Lack of sparsity and the existence of many near-optimal solutions affect the solution time of off-the-shelf solvers to the level that they are not usable for clinical purposes. This has led to the development of new algorithms or more efficient implementations for linear optimization (Herman and Chen 2008, Breedveld et al 2017), and the use of nonlinear optimization by many vendors of treatment planning systems (Trofimov et al 2012).

One of the algorithms for linear optimization of radiation therapy is ART3+O (Herman and Chen 2008, Chen et al 2010, 2012). This algorithm is based on the algorithms ART3 by Herman (1975) that was originally proposed for image reconstruction, and its successor ART3+ (Herman and Chen 2008). ART3 and ART3+ fall in the category of projection algorithms, which have a long history (Bauschke and Borwein 1996). Algorithm 1 shows the outline of ART3, slightly simplified for our needs. The algorithm takes as input a matrix A, a vector b, and iteratively changes an arbitrary starting point x until $Ax \leqslant b$ . In iteration k, it considers the ith row of A, where $i=1,2,\ldots,m,1,2,\ldots,m,1,2,$ etc. If the ith inequality is violated, x is reflected into the hyperplane A_ix = b. Figure 1 depicts three iterations of ART3, and shows that a feasible point is found after two reflections. If the interior of the feasible set is full dimensional, ART3 finds a feasible point in a finite number of steps (versus in the limit) (Herman 1975). The key advantages of ART3 are its low memory requirements, fast practical running time, and simplicity.

**Figure 1.** Three iterations of ART3. The feasible region is shaded while the constraints are dashed. The starting point x⁰ is reflected into the first constraint to get to x¹, then into the second constraint to get to x². Since x² is in the feasible region, the algorithm terminates.
Download figure:
Standard image High-resolution image

The performance of ART3 has been improved by temporarily removing nonviolated constraints from the inner loop (algorithm 2). This improved algorithm, ART3+, benefits from not having to check for a violation for those constraints that are not likely to be violated, which yields a significant speed-up (Herman and Chen 2008). Similar to ART3, ART3+ terminates in a finite number of steps if the feasible set is full dimensional.

Algorithm 1. The image reconstruction algorithm ART3 (Herman 1975).

Data: $A \in \mathbb{R}^{m \times n}$ $A \in \mathbb{R}^{m \times n}$ , $b \in \mathbb{R}^{m}$ $b \in \mathbb{R}^{m}$

Result: x such that $Ax \leqslant b$ $Ax \leqslant b$

Let the starting point $x \in \mathbb{R}^n$ $x \in \mathbb{R}^n$ be an arbitrary vector;

$0 \to k$ $0 \to k$ ;

while $Ax \nleq b$ $Ax \nleq b$ do

$1 + (k~{\rm mod}~m) \to i$ $1 + (k~{\rm mod}~m) \to i$ ;

$A_i x - b_i \to v_i$ $A_i x - b_i \to v_i$ ;

if

$v_i > 0$ then

$x - 2\frac{v_i}{||A_i||} A_i^T \to x$ $x - 2\frac{v_i}{||A_i||} A_i^T \to x$

end

$k + 1 \to k$ $k + 1 \to k$ ;

end

Algorithm 2. The algorithm ART3+ (Herman and Chen 2008).

Data: $A \in \mathbb{R}^{m \times n}$ $A \in \mathbb{R}^{m \times n}$ , $b \in \mathbb{R}^{m}$ $b \in \mathbb{R}^{m}$

Result: x such that $Ax \leqslant b$ $Ax \leqslant b$

Let the starting point $x \in \mathbb{R}^n$ $x \in \mathbb{R}^n$ be an arbitrary vector;

$0 \to k$ $0 \to k$ ;

while $Ax \nleq b$ $Ax \nleq b$ do

$\{1,2,\ldots,m\} \to S$ $\{1,2,\ldots,m\} \to S$ ;

while $\newcommand{\e}{{\rm e}} S \neq \emptyset$ $\newcommand{\e}{{\rm e}} S \neq \emptyset$ do

Let i be the next element in S;

$A_i x - b_i \to v_i$ $A_i x - b_i \to v_i$ ;

if

$v_i > 0$ then

$x - 2\frac{v_i}{||A_i||} A_i^T \to x$ $x - 2\frac{v_i}{||A_i||} A_i^T \to x$

else

$S\backslash\{i\} \to S$ $S\backslash\{i\} \to S$

end

Algorithm 3. The algorithm ART3+O (Chen et al 2010).

Data: $A \in \mathbb{R}^{m \times n}$ $A \in \mathbb{R}^{m \times n}$ , $b \in \mathbb{R}^{m}$ $b \in \mathbb{R}^{m}$ , $c \in \mathbb{R}^n$ $c \in \mathbb{R}^n$ , $L \in \mathbb{R}$ $L \in \mathbb{R}$ , $U \in \mathbb{R}$ $U \in \mathbb{R}$

Result: x that solves (1)

while $U-L > \varepsilon$ $U-L > \varepsilon$ do

$(L+U)/2 \to M$ $(L+U)/2 \to M$ ;

Call ART3+ to find x such that $\newcommand{\e}{{\rm e}} \left(\begin{array}{@{}c@{}}-c^T \\ A \\ -I\end{array}\right) x \leqslant \left(\begin{array}{@{}c@{}} -M \\ b \\ 0\end{array}\right)$ $\newcommand{\e}{{\rm e}} \left(\begin{array}{@{}c@{}}-c^T \\ A \\ -I\end{array}\right) x \leqslant \left(\begin{array}{@{}c@{}} -M \\ b \\ 0\end{array}\right)$ ;

if ART3+ found such an x then

$c^Tx \to L$ $c^Tx \to L$

else

$M \to U$ $M \to U$

end

The algorithm ART3+O is an optimization algorithm that uses ART3+ as its workhorse. Given a linear optimization problem (1) and an interval $[L,U]$ that is known to contain the optimal objective value, ART3+O uses bisection search to shrink the interval until it finds an $\varepsilon$ -optimal value of (1). A description is given in algorithm 3. In each iteration, ART3+O determines the midpoint of the interval $M=(L+U)/2$ and then establishes whether (1) has a feasible point with an objective value of at least M by calling ART3+. If ART3+ finds such a feasible point, the search continues on the interval [c^Tx,U], otherwise it continues on $[L,M]$ , and repeats until the width of the interval is less than $\varepsilon$ . The x^ART3+O that is found, is at most $\varepsilon$ from the optimum, meaning that $c^Tx^{ART3+O} \geqslant c^Tx - \varepsilon$ for all $x \geqslant 0$ such that $Ax \leqslant b$ .

A practical problem with ART3+O is that ART3+ runs indefinitely when there is no x that satisfies $Ax \leqslant b$ . Therefore, the algorithm needs to be stopped prematurely when it becomes unlikely that a feasible point will be found, e.g. after a large number of iterations, after a certain runtime, or when the iterates show a certain behavior (De Pierro and Iusem 1985, Censor and Tom 2003). This carries the risk of incorrectly concluding that an objective value of M is unachievable. An $\varepsilon$ -optimal solution can therefore not be guaranteed. In addition, the user is faced with a difficult trade-off: using a conservative stopping criterion means many CPU cycles are wasted on finding a solution that does not exist whereas a more aggressive criterion means that the final solution can be far from $\varepsilon$ -optimal. What makes matters worse is that there is no guidance to what constitutes a 'conservative' stopping criterion. As of today there is no known upper bound on the number of iterations ART3+ may take to find a feasible point.

This paper proposes an improvement to ART3+O so that it can accurately detect if an objective value of M is attainable. It eliminates the need for an arbitrary stopping criterion and guarantees an $\varepsilon$ -optimal solution. With this improvement, the algorithm still terminates in a finite number of steps.

2. Methods

The cornerstone of the proposed method is Farkas' lemma. Although this lemma can be stated in many different ways, our results can be derived from the following form:

Theorem 1 (Farkas' lemma). Exactly one of the following statements is true (Farkas 1902):

(i)
There exists an $x \in \mathbb{R}^n$ such that Fx = b and $x \geqslant 0$ .
(ii)
There exists a $y \in \mathbb{R}^m$ such that $F^Ty \geqslant 0$ and $b^Ty \leqslant -1$ .

This lemma is a statement of alternatives. If the first statement is true, the second statement is not true and vice versa.

The algorithm ART3+O calls ART3+ to find a point x in the set

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{farkas:eq1} \{x : c^Tx \geqslant M, Ax \leqslant b, x \geqslant 0\}. \nonumber \end{align} \tag{ 3 }$

Such a point exists if and only if there exists $s_1 \in \mathbb{R}$ and $s_2 \in \mathbb{R}^m$ such that:

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \left(\begin{array}{@{}ccc@{}}-c^T & 1 & O \nonumber \\ A & 0 & I\end{array}\right) \left(\begin{array}{@{}c@{}}x \nonumber \\ s_1 \nonumber \\ s_2 \end{array}\right) = \left(\begin{array}{@{}c@{}}-M \nonumber \\ b\end{array}\right), \quad x \geqslant 0, \quad s_1 \geqslant 0, \quad s_2 \geqslant 0. \nonumber \end{align*}$

We use these equations for the first statement of Farkas' lemma. The equations in the second statement are:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{farkas:eq2} -c y_1 + A^T y_2 \geqslant 0, \quad -M y_1 + b^T y_2 \leqslant -1, \quad y_1 \geqslant 0, \quad y_2 \geqslant 0. \nonumber \end{align} \tag{ 4 }$

Therefore, the set (3) is empty if and only if there is a y that satisfies (4).

Currently, ART3+O cannot conclude with certainty that the set (3) is empty. It presumes that the set is empty when ART3+ cannot find a point in that set within the allocated runtime or number of iterations, but that presumption may be wrong.

Algorithm 4. The suggested improved algorithm.

Data: $A \in \mathbb{R}^{m \times n}$ $A \in \mathbb{R}^{m \times n}$ , $b \in \mathbb{R}^{m}$ $b \in \mathbb{R}^{m}$ , $c \in \mathbb{R}^n$ $c \in \mathbb{R}^n$ , $L \in \mathbb{R}$ $L \in \mathbb{R}$ , $U \in \mathbb{R}$ $U \in \mathbb{R}$

Result: x that solves (1)

while $U-L > \varepsilon$ $U-L > \varepsilon$ do

$(L+U)/2 \to M$ $(L+U)/2 \to M$ ;

Call two instances of ART3+ in parallel:

(i) to find x such that $\newcommand{\e}{{\rm e}} \left(\begin{array}{@{}c@{}}-c^T \\ A \\ -I\end{array}\right) x \leqslant \left(\begin{array}{@{}c@{}} -M \\ b \\ 0\end{array}\right)$ $\newcommand{\e}{{\rm e}} \left(\begin{array}{@{}c@{}}-c^T \\ A \\ -I\end{array}\right) x \leqslant \left(\begin{array}{@{}c@{}} -M \\ b \\ 0\end{array}\right)$

(ii) to find y such that $\newcommand{\e}{{\rm e}} \left(\begin{array}{@{}cc@{}}c & -A^T \\ -M & b^T \\ -1 & 0 \\ O & -I\end{array}\right) \left(\begin{array}{@{}c@{}} y_1 \\ y_2 \end{array}\right) \leqslant \left(\begin{array}{@{}c@{}} 0 \\ -1 \\ 0 \\ 0\end{array}\right)$ $\newcommand{\e}{{\rm e}} \left(\begin{array}{@{}cc@{}}c & -A^T \\ -M & b^T \\ -1 & 0 \\ O & -I\end{array}\right) \left(\begin{array}{@{}c@{}} y_1 \\ y_2 \end{array}\right) \leqslant \left(\begin{array}{@{}c@{}} 0 \\ -1 \\ 0 \\ 0\end{array}\right)$ ;

When one instance of ART3+ finds a solution, terminate the other instance;

if a feasible x was found then

$c^Tx \to L$ $c^Tx \to L$

else

$M \to U$ $M \to U$

end

We propose to run ART3+ both on the set (3) and on the inequalities (4) in parallel, until it finds a point in either set. When it finds a point that satisfies (4), that is mathematical proof that the set (3) is empty and that an objective value of M is unachievable. Therefore, bisection search can continue on the interval $[L,M]$ . This algorithm is outlined in algorithm 4.

ART3+O requires that the rows of the constraint matrix A can be accessed efficiently, while the improved algorithm also requires efficient access to the columns of A. If only a single copy of A is kept in memory, one of the two instances of ART3+ cannot have sequential data access, causing unnecessary slowness. In radiation therapy, this is an issue because the matrix D in (2) is often stored in sparse format, meaning that only the indices and values of nonzero elements get stored. For sparse formats, either the rows or the columns are efficiently accessible, but not both. Therefore, the D matrix needs to be transposed. Consequently, the memory requirements of the improved algorithm are roughly twice as high as those of ART3+ O. The difference in runtime is harder to predict. Although algorithm 4 runs two threads in parallel and therefore uses twice as much CPU power when a feasible point x is found, it is possible that the second thread terminates early enough when no feasible x exists to make up for this.

Since ART3+ terminates in a finite number of steps if the feasible set is full dimensional (Herman and Chen 2008), it trivially follows that algorithm 4 also terminates in a finite number of steps. The requirement of full dimensionality is not an actual restriction: techniques from the ellipsoid method can be used to slightly perturb the inequalities in a way that the feasible region becomes full dimensional if it is not empty, while it remains empty if the original feasible region is empty (Bland et al 1981). Nevertheless, we shall validate the full dimensionality assumption in our numerical results by subtracting a small number (10⁻³) from the right-hand side of the inequalities given to ART3+ by algorithm 4 and show that a point satisfying those inequalities exists.

We have taken the implementation of ART3+O from the in-house treatment planning system Astroid and modified it to make it accurately correspond to algorithm 3, with a stopping criterion of $2\cdot 10^7$ iterations and $\varepsilon=0.1$ (Chen et al 2010). To implement algorithm 4, we added multithreading capabilities and code to transpose a sparse matrix. As the starting point for ART3+, we took the zero vector in the first call, and the final point from the previous call in subsequent calls. This makes the implementation nondeterministic, because the thread that gets terminated without finding a point has run a nondeterministic number of iterations.

The method is tested on a small radiation therapy case with 227 pencil beams. The objective is to maximize the minimum dose in the 256 voxel PTV, while the maximum dose in a surrounding structure of 8406 voxels is constrained to 50 Gy. The relevant part of the D matrix has 288 610 nonzero elements ( $\approx$ $15\%$ ). The auxiliary variable necessary to model a minimum dose objective in the format of (1) was replaced with M to avoid unnecessary overhead. So, the optimization problem is $\max_{x \geqslant 0,t \geqslant 0}\{t : D_1 x \geqslant t, D_2 x \leqslant 50e\}$ , but the inequalities given to ART3+ in algorithm 4 are $\{D_1 x \geqslant Me, D_2 x \leqslant 50e, x \geqslant 0\}$ and $\{-D_1 y_{11} + D_2 y_{12} \leqslant 0, M e^T y_{11} - 50 e^T y_{12} \leqslant -1, y \geqslant 0 \}$ .

The method is also tested on a larger case with 5384 pencil beams and approximately $2 \cdot 10^6$ constraints that limit the mean, minimum or maximum dose, and an objective to minimize the maximum dose in a structure with $3 \cdot 10^5$ voxels. The objective was treated in the same way as the small case.

The method is tested on another 77 clinical test cases spanning 11 patients with different tumor sites. Each patient has a common set of constraints, but different objectives (at most 10 per patient). In addition to minimum and maximum dose objectives and constraints, some cases have mean dose constraints or a mean underdose objective. The D matrices vary in size between 12 MB and 2.5 GB. These cases will be run on the ERISone cluster (Partners Healthcare, Boston, MA, USA), which uses Intel E5-2690 v3 and Intel X5670 CPUs, with a time limit of 1 d of CPU-time (around 12 h of wall-clock time).

The first two cases are run on a machine with two Intel E5-2687W v3 CPUs. CPLEX 12.8, a state of the art solver for linear optimization, is used to obtain the optimal solution for both cases and to validate the full dimensionality assumption. Although the second case is large and CPLEX runs out of memory if it attempts to solve it, it can solve an equivalent reformulation. The case has one D matrix per beam, D¹ and D², with corresponding x¹ and x², and constraints are formulated on $D^1x^1$ , on $D^2x^2$ , and on $D^1 x^1 + D^2x^2$ . By adding auxiliary variables $d^1 = D^1 x^1$ and $d^2 = D^2 x^2$ , we avoid that rows of D^k get duplicated across constraints, which resolves the memory issue. To resolve the memory issue when trying to validate the full dimensionality assumption for the y -space, we take the feasibility problem for x and turn it into an optimization problem by adding the objective of maximizing $-\sum x_i$ . The dual variables of the inequality constraints then satisfy $c y_1 - A^T y_2 \leqslant -1$ , b^Ty < 0, and $y\geqslant 0$ by LP duality. We then slightly increase each element of y such that y > 0.

3. Results

CPLEX solves the small case in 0.8 s and shows that the optimal value is 32.71 Gy. We confirmed that the feasible set is full dimensional in the first instance of ART3+ as long as M is less than the optimal value, while it is full dimensional in the second instance of ART3+ when M is larger than the optimal value. ART3+O takes 12.5 s to obtain a solution with value 32.57. Table 1 shows the steps taken by algorithm 4 during its runtime of around 2.5 h. In steps 2, 4, 5 and 9, it was proven that the objective values 37.51, 34.38, 32.82 and 32.77 are unachievable, while in steps 1, 3, 6, 7 and 8 the objective value was gradually improved to 32.71. Although this final solution is guaranteed to be within 0.1 Gy of the optimum, it is in fact only 0.0015 Gy from the optimum. Looking at the number of iterations, there are two remarkable observations. The first is that it takes a considerable number of iterations to conclude that a value of M is unattainable. Even for M = 37.5 (Step 2), which is far from achievable, $25 \cdot 10^6$ iterations are necessary. The second is that close to optimality, even the regular ART3+ instance may take a large number of iterations. In Step 8, $29\cdot 10^9$ iterations are needed to obtain a solution for M = 32.71. This is orders of magnitude higher than the iteration limit of $2\cdot 10^7$ used by Chen et al (2010).

Table 1. Results of algorithm 4 on the small radiation therapy case. $[L,U]$ is the search interval of bisection search, Farkas indicates whether the first or second statement in Farkas' lemma was found to be true, and the last two columns show the runtime statistics of the instance of ART3+ that found a solution.

Step	L	U	Farkas	Time (s)	Iterations
1	0.00	50.00	1	0.1	62 356
2	25.01	50.00	2	1.9	24 948 327
3	25.01	37.51	1	0.0	332 498
4	31.26	37.51	2	11.1	147 789 618
5	31.26	34.38	2	3160.4	39 374 210 616
6	31.26	32.82	1	0.0	53 662
7	32.38	32.82	1	0.1	2477 787
8	32.60	32.82	1	1345.9	29 435 448 946
9	32.71	32.82	2	4618.7	57 541 614 242

CPLEX (barrier) solves the large case in just under 3.5 h with an optimal value of 50.8 Gy. ART3+O finds a value of 51.6 Gy in 847.4 s. Algorithm 0 takes 8 s to transpose the D matrix. In the two steps of bisection search, the second instance of ART3+ proves that M = 26.47 and M = 39.70 are unattainable in approximately 6 min and 6 h, respectively. We terminated bisection search in the third step of bisection search, because even after 100 h and $218 \cdot 10^9$ iterations, ART3+ could not prove that M = 46.32 is unattainable, despite that the full dimensionality assumption holds for this value of M.

For the test set of 77 cases, ART3+O does not find a feasible point within the iteration limit for half the patients, affecting 43 of the optimization runs, and incorrectly concludes that the problem is infeasible. We therefore re-ran ART3+O with an iteration limit of 10⁹ instead of $2\cdot 10^7$ , affecting the runtimes by a median factor of 26. Still, two out of ten patients are incorrectly classified as infeasible, affecting 13 optimization runs. The remaining 64 runs took between 8.5 s and 3:05 h (median: 35 min). The suboptimality is between 0 and 3.6 Gy (median: 0.2 Gy), and is less than 0.1 Gy in 13 out of 64 cases. Algorithm 4 finds a feasible point in all cases. For the 64 cases that ART3+O could solve, algorithm 4 finishes before the time limit in seven cases, but is always slower than ART3+O. The suboptimality ranges between 0 Gy and 12.5 Gy (median: 1.1 Gy). Algorithm 4 outperforms ART3 +O by at least 0.1 Gy in 14 cases, while the opposite is true in 41 cases. Algorithm 4 solves the 13 cases that ART3+O incorrectly declares infeasible, with the suboptimality ranging between 0 and 5.7 Gy (median 0.2 Gy).

4. Discussion

The results presented by Chen et al (2010) show that ART3+O can be used to quickly optimize a treatment plan. Its main weakness is the arbitrary iteration limit. In our numerical examples, we found out that this weakness is not purely theoretical, but that it has practical impact. The iteration limit suggested by Chen et al (2010) is too low, and even using a considerably higher number still leads to suboptimalities of up to 3.6 Gy. The theoretical advantage of the proposed alternative does not translate into a practical advantage. Despite the significantly higher CPU time (24 h versus 3:05 h), the final objective value is often worse.

The dual problem of (1) is $\min\{b^Ty : A^T y \geqslant c, y \geqslant 0\}$ . Alternative solvers for linear optimization, notably the simplex method and an interior point method, generate not just an optimal solution x^* to (1), but also an optimal dual solution y ^* as a byproduct. Due to strong duality, $c^Tx^* = b^Ty^*$ if the problem is feasible and bounded. For $M\leqslant c^Tx^*$ , x^* is a solution for the first instance of ART3+, while for $M>c^Tx^*$ , $y_1 = 1/(M-b^Ty^*)$ and $y_2 = y_1 y^*$ is a solution for the second instance of ART3+ in algorithm 4. That means that alternative solvers solve the same two problems as algorithm 4.

It is an open question as to why ART3+ often finds near optimal solutions in a moderate number of iterations, while it takes significantly more iterations to prove that a certain objective value is unattainable. A possible explanation is that although the primal can have a large number of constraints, many are correlated or redundant, such as constraints on the maximum dose in two neighboring voxels. Correlated constraints do probably not occur in the second instance of ART3+ in algorithm 4, because that would require two different x_j to deliver almost the same dose. The phenomenon that it is easy to find an x and hard to find a y is problem specific. Outside of the domain of radiation therapy there are problems for which the opposite is true. An artificial example can be constructed by taking the dual of (1) for a radiation therapy problem.

Our improvement to ART3+O can be applied to ART3+ as well. ART3+ was originally proposed for image reconstruction by Herman (1975) where it approximately solves a large system Ax = b by finding a solution to

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq:art3}b-\varepsilon \leqslant Ax \leqslant b + \varepsilon. \nonumber \end{align} \tag{ 5 }$

Herman (1975) suggested to select $\varepsilon$ based on prior knowledge or experimentally, but there is a delicate balance. When $\varepsilon$ is chosen too small, there may be no solution, while a large value allows for solutions that do not accurately reconstruct the original distribution. It is therefore desirable to determine the smallest $\varepsilon$ for which a solution exists. By Farkas' lemma, there is either an x that satisfies (5), or a y that satisfies $A^T y_1 - A^T y_2 \geqslant 0$ , $(b+\varepsilon){}^Ty_1 - (b-\varepsilon){}^Ty_2 \leqslant -1$ , $y_1 \geqslant 0$ , $y_2 \geqslant 0$ . One could therefore run ART3+ in parallel on both sets of inequalities to establish if there is a solution to (5) for a given value of $\varepsilon$ .

It is well known that the order in which the projections are performed has a significant impact on the performance of a projection algorithm. Ideally, consecutive constraints are as orthogonal as possible (Guan and Gordon 1994). To test the impact on our results, we have reordered the constraints in a way that consecutive constraints are as orthogonal as possible. We did this greedily, starting with the nonnegativity constraints (which are fully orthogonal), and then adding the constraints one by one such that the added constraint was as orthogonal as possible to the previous one. Due to the computational cost of this ordering heuristic, we only tested it on the small case. To our surprise, the number of iterations increased for the second instance of ART3+, making the overall performance worse. An alternative ordering based on the golden ratio (Kohler 2004) did not improve the performance either.

The lack of a good stopping criterion has also been noted for projection methods in convex optimization (Gibali et al 2014). The conic variant of Farkas' lemma could be used for such problems. Let f _i be closed and convex functions and let $f_i^*$ denote their convex conjugate functions. Exactly one of the following sets is not empty:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle 1. \; \{x \in \mathbb{R}^n : f_i(x) < 0 \quad i=1,\ldots,m\} \label{convexset1} \nonumber \end{align} \tag{ 6 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle 2. \; \{\lambda \in \mathbb{R}^m_+, v^i \in \mathbb{R}^n : \sum_{i=1}^m v^i = 0, \sum_i \lambda_i f^*\left(\frac{v^i}{\lambda_i}\right) \leqslant -1 \}. \label{convexset2} \nonumber \end{align} \tag{ 7 }$

The derivation is given in the appendix. A necessary step toward full dimensionality is substituting out one of the vectors $v^i$ . By using the two sets (6) and (7), the methods in this paper extend to convex optimization.

5. Conclusion

ART3+O cannot guarantee $\varepsilon$ -optimality of the final solution due to an arbitrary stopping criterion. An improvement that provides such a guarantee was suggested, but despite its theoretical basis, its runtime is too high, especially in comparison with alternative solvers. If an optimality guarantee is not required, ART3+O remains a viable algorithm, but their users should be aware that a solution may be suboptimal.

Acknowledgment

Supported in part by NIH U19 Grant 5U19CA021239-38.

Appendix. Extension to convex optimization

Theorem A.1 (Conic variant of Farkas' lemma). Let K be a closed convex cone with dual cone K^* and let $F \in \mathbb{R}^{m \times n}$ . Exactly one of the following statements is true (Craven and Koliha 1977):

(i)
There exists an $x \in \mathbb{R}^n$ such that Fx = b and $x \in K$ .
(ii)
There exists a $y \in \mathbb{R}^m$ such that $F^Ty \in K^*$ and $b^Ty \leqslant -1$ .

The closure of the set (6) can be denoted as $\{(x,t) : t=1, (x,t) \in K = \cap_{i=1}^m K_i\}$ , with $K_i \,=$ $\{(x,t) : t f_i(x/t) \leqslant 0, t \geqslant 0\}$ and $0 f_i(x/0) = \lim_{t \downarrow 0} tf_i(x/t)$ . Note that K is indeed a closed convex cone. The x in Farkas' lemma is actually the vector $(x,t) \in \mathbb{R}^{n+1}$ , $F=(0,\ldots,0,1) \in \mathbb{R}^{1 \times (n+1)}$ , and the second set in Farkas' lemma is $\{y \in \mathbb{R} : (0,\ldots,0,y) \in K^*, y \leqslant -1\}$ . The condition $(0,\ldots,0,y) \in K^*$ is by definition equivalent to $ty \geqslant 0$ $\forall (x,t) \in \cap_{i=1}^m K_i$ . To eliminate the ' $\forall$ ' quantifier, we rewrite this definition as $\min_{x,t \geqslant 0} \{ty : t f_i(x/t) \leqslant 0 \; i=1,\ldots,m\} \geqslant 0$ , and then apply Lagrange duality to the minimization problem using the method by Roos et al (2018). The objective function of this problem is g₀(x,t) = ty and the constraint functions are $g_i(x,t) = t f_i (x/t)$ . The corresponding conjugate functions are $g_0^*(v^0,u^0) = 0$ if $v^0=0$ and u⁰ = y ( $\infty$ otherwise) and $g_i^*(v^i,u^i) = 0$ if $f^*(v^i) + u^i \leqslant 0$ ( $\infty$ otherwise). Therefore, by Roos et al (2018), the Lagrange dual is $\max_{\lambda \geqslant 0,v,u}\{0 : \sum_{i=1}^m v^i = 0, y+\sum_{i=1}^m u^i = 0, \lambda_i f^*(v^i / \lambda_i) + u^i \leqslant 0 \}$ . The second statement in Farkas lemma reduces to: there exists a $y \in \mathbb{R}$ , $\lambda \in \mathbb{R}^m_+$ , $v^i \in \mathbb{R}^n$ and $u^i \in \mathbb{R}$ such that $\sum_{i=1}^m v^i = 0$ , $y+\sum_{i=1}^m u^i = 0$ , $\lambda_i f^*(v^i / \lambda_i) + u^i \leqslant 0$ and $y \leqslant -1$ . Substituting out y and $\sum_i u_i$ , we get the set (7).