Cheap contouring of costly functions: the Pilot Approximation Trajectory algorithm

Janne M J Huttunen; Philip B Stark

doi:10.1088/1749-4699/5/1/015006

1. Introduction

A contour, level curve, or isoline of a real-valued function of two variables $g:\Delta\to \mathbb{R}$ is the set of points in the domain Δ of the function on which g takes one particular 'level' τ:

$\begin{equation} \gamma_{g, \tau} \equiv \{x \in \Delta: g(x)=\tau\}. \end{equation}\noindent \tag{ 1 }$

Contours are useful in many contexts, including visualizing geographical and meteorological data.

Typical contouring algorithms start with the values of g on a dense, regular grid. They approximate values of g between grid points by interpolation, usually linear interpolation (Cottafava and Moli 1969, McLain 1974, Aramini 1980). Work on efficiency in contouring algorithms primarily addresses large datasets; see, for example, Agarwal et al (2008) or van Kreveld et al (1997). For algorithms to construct contours in higher dimensions (iso-surfaces), see for example ( Bhaniramka et al 2004, Schreiner et al 2006). Contouring algorithms are closely related to algorithms for constructing level sets {x∈Δ:g(x) ⩾ τ}. Sethian (1999), Willett and Nowak (2005) and Scott and Davenport (2007) discuss efficient algorithms for approximating level sets.

We address a different bottleneck: finding approximate contours when it is very expensive to evaluate g. If g is expensive to evaluate and varies nonlinearly—for example, if each evaluation of g requires performing a physical experiment, running a computationally intensive simulation (as in climate modeling), or solving coupled nonlinear partial differential equations (PDEs) numerically—it can be impractical to evaluate g on a grid dense enough to approximate contours accurately using standard methods. To reiterate: we are not addressing contouring the solution to a single coupled nonlinear PDE problem on a two-dimensional domain. Rather, we are addressing the contouring of a function of two variables that for each value of a two-dimensional argument could require solving a different set of coupled nonlinear PDEs, or some other expensive calculation. The computational domain for the PDEs (typically a spatio-temporal domain) has no connection to the contour grid (a grid of parameter values).

Bryan et al (2006) and Bryan and Schneider (2008) describe an active learning algorithm for identifying contours of a function. Their algorithm uses statistical criteria to decide where to evaluate the function. In this paper, we describe a different approach: we use an approximation of the function as a starting point for an iterative search to locate the contour to any desired accuracy. We check the accuracy of the approximate contour by random sampling. The check does not require any assumptions about the function or the accuracy of the initial approximation (e.g., it does not require any regularity conditions on g).

In applications, particular contours such as the zero contour can be especially interesting. Examples are given below. In some such cases, an accurate approximate contour can be found using relatively few function evaluations by starting with an 'educated guess' about where the contour is—a pilot approximation—and then refining that approximation adaptively.

We present an algorithm that exploits this tactic, Pilot Approximation Trajectory (PAT). PAT contours at one level at a time. It takes advantage of the fact that g can be evaluated anywhere in Δ—albeit at a cost. By judiciously selecting where to evaluate g and avoiding evaluating g far from the level of interest, PAT requires fewer evaluations of g than standard contouring methods to attain the same accuracy—or better.

PAT starts by building a cheap pilot approximation $\hat g$ of g. The pilot approximation might be constructed by interpolating values of g on sparse grid, for instance using splines. If evaluating g at a point requires solving a PDE numerically, $\hat g(x)$ might be the numerical solution based on a coarser discretization. In the latter case, note that the PDE discretization is in a different space from the contour grid: we are addressing problems where each evaluation of g requires solving a different PDE. The approximation $\hat g$ involves solving each of those PDEs less accurately, not evaluating the solution to a single PDE at fewer points x. The pilot approximation is used to predict the location and trajectory (local orientation) of the contour, which are then refined by evaluating g at additional points. PAT treats g as a 'black box,' so it can estimate contours of complicated functions, including nondifferentiable functions involving real or simulated experiments.

To compare the performance of PAT to standard contouring algorithms requires quantifying the error of approximate contours. A contour divides the domain Δ into a set where g has values above the level τ and a set where the value of g is less than or equal to τ. The latter is called the lower level set of g (at level τ):

$\begin{equation} \Lambda_\tau = \Lambda_\tau (g) \equiv \{ x \in \Delta : g(x) \leqslant \tau \}. \end{equation} \tag{ 2 }$

One can quantify accuracy of approximate contours based on how well they perform that classification—on how much the actual and estimated lower level sets differ, i.e., how well the approximate contour classifies points in Δ.

An approximate contour can make two kinds of classification errors: it can include values of g that are above the level, and it can fail to include values of g that are at or below the level. Let $\hat {\Lambda }_\tau$ be the estimated lower level set of g at level τ according to the approximate contour. The first kind of error occurs on the set $\hat {\Lambda }_\tau \cap \Lambda _\tau ^c$ and the second kind of error occurs on the set $\hat {\Lambda }_\tau ^c \cap \Lambda _\tau$ . The union of these two sets, the points in the domain Δ that the approximate contour misclassifies, is the symmetric difference between $\hat {\Lambda }_\tau$ and Λ_τ:

$\begin{equation} \hat{\Lambda}_\tau \ominus \Lambda_\tau \equiv (\hat{\Lambda}_\tau \cap \Lambda_\tau^c) \cup (\hat{\Lambda}_\tau^c \cap \Lambda_\tau). \end{equation} \tag{ 3 }$

The area of this symmetric difference quantifies the error of the approximate contour. We compare PAT and standard contouring on the basis of their computational burden and on this measure of their accuracy in tests where the true contour is known. In practice, the true contour generally is not known. But PAT can use random sampling to find an upper confidence bound on the area of $\hat {\Lambda }_\tau \ominus \Lambda _\tau$ , assessing its own accuracy through additional evaluations of g.

The symmetric difference is not the only reasonable definition of the error of an approximate contour, even among measures based on the lower level set. For instance, if misclassifying some points in the domain has more serious consequences than misclassifying others, the fraction of the area of the domain on which the approximate contour errs might not be the most relevant measure of accuracy. One could use weighted areas to account for differing importance, for instance, integrals with respect to a measure other than the Lebesgue measure.

Measuring accuracy by the area of the symmetric difference between the true and approximate lower level sets is meaningful in applications. For instance, in one approach to uncertainty quantification, the ensemble of models method, Higdon (2010) rely on dividing the set of possible parameter values (models) into those that agree 'adequately' with the data and those that do not. Agreement is measured using a misfit function. The set of models that agree adequately with the data is a lower level set of the misfit function. In some problems, a lower level set of the misfit is a confidence set for the model that generated the data. Our motivating application is to find such a confidence set for a two-dimensional parameter from indirect, noisy measurements—an inverse problem. This is related to the example that motivates the work of Bryan et al (2006).

The paper is organized as follows. In the next subsection we give an example of a simple problem that motivates this work. Section 2 presents a basic version of PAT and a refinement to treat delicate cases, such as saddle points. It also discusses constructing confidence bounds on the fraction of the domain that the PAT contour correctly classifies as above or below the contour level. Section 3 applies PAT to two examples: contouring a function with a saddle point and finding a confidence set for viscosity and thermal conductivity from simulated data in an airflow inverse problem. Section 4 summarizes our findings.

1.1. A model problem: the heat equation

We consider a simple problem in which PAT is helpful: estimating the coefficients of a constant-coefficient steady state heat equation. The governing equation is

$\begin{equation} a \nabla^2 u - bu = h, \end{equation} \tag{ 4 }$

where a and b are the coefficients and h is a source term. This problem is two-dimensional in that the PDE depends on two coefficients; the domain of the PDE is three-dimensional. For example, the coefficients a and b might be parameters related to heat conductivity and perfusion; see Pennes (1948). We seek a 95% simultaneous confidence set for the unknown coefficients a and b based on observations y = (y₁,...,y_n)^T from u,

$\begin{equation} y = G(u) + \epsilon, \end{equation} \tag{ 5 }$

where is a n-vector representing observational noise and G = (G₁,...,G_n)^T represents the relationship between the observations and the underlying PDE. For example, G can be a linear interpolation operator which samples u of (4) in given measurement locations or it can be the negative temperature coefficient (NTC) resistor model. We assume that the components of are independent, identically distributed (iid) Gaussian variables with mean 0 and variance 1.

We will be contouring a function of $x=(a,b)\in \Delta \subset \mathbb R_+^2$ , the vector of unknown parameters. Each value of x is a possible value for the coefficients in the heat equation, not a spatial location within the domain of the equation. Equation (5) has unique solution u for all positive x = (a,b) > 0. The solution can be approximated numerically, for example, using a finite-element method. Define the function f such that f(x) = G(u). That is, f maps a coefficient vector x = (a,b) into the n noise-free measurements according to the finite-element solution of (4) for coefficients x = (a,b). To evaluate f at a single point x = (a,b) requires finding the finite-element solution to equation (4) for the coefficients x = (a,b) and then applying the measurement model G to that finite-element solution.

The forward problem that maps a parameter vector to the observations thus can be written

$\begin{equation} y = f(x) + \epsilon. \end{equation} \tag{ 6 }$

We want to find a confidence set for x with confidence level 95%. This is a set C(y) for which

$\begin{eqnarray*} &&\Pr \{ C(Y) \ni x \} \geqslant 95\% \end{eqnarray*}$

whatever the true value x = (a,b) happens to be. An example of a 95% confidence set is

$\begin{eqnarray*} &&C(y) = \left\{ x \in \Delta: \sum_{k=1}^n (y_k-f_k(x))^2 \leqslant \chi^2 \right\}, \end{eqnarray*}$

where the constant χ is chosen such that

$\begin{eqnarray*} &&\Pr \{ \| \epsilon \|^2 \leqslant \chi^2 \} \geqslant 0.95. \end{eqnarray*}$

The appendix gives more explanation.

The boundary of the confidence set C(y) is the level-χ contour of the function g(x) = $\left (\sum _{k=1}^n (y_k- f_k(x))^2\right )^{1/2}$ . Evaluating g for a given parameter pair x = (a,b) can be expensive, especially if the problem is modeled in three spatial dimensions and the computational mesh contains large number of nodes. Regular contouring approaches might require evaluating g on a grid of 128 × 128 different parameter values, for instance. That would mean running the finite-element solver 128² times, once for each of the 128² parameter values. Clearly, it would be useful to be able to find the boundary of the confidence set with fewer evaluations of g; that is, the problem PAT addresses.

There are at least two ways to construct a pilot approximation $\hat g$ for PAT:

(i)
Linear interpolation of g on a sparse grid. Evaluate g on a small grid of points (say 10 × 10 grid) in Δ. Divide each grid element into two triangles, forming a triangular tessellation of Δ. Set the pilot approximation $\hat g$ to be the piecewise linear function equal to g at the nodes, linear within each triangle. (Instead of linear interpolation, one might use MARS (Friedman 1991) or a Gaussian process model (O'Hagan and Kingman 1978, Sacks et al 1989, Kennedy and O'Hagan 2001)).
(ii)
Solve the PDE at a low level of approximation for a dense grid of values of x. Computing g involves the finite-element solution of a PDE on a computational mesh. We can use the same finite-element solver on a coarser computational mesh to evaluate $\hat g$ . For example, calculating g might involve a mesh with 128³ nodes, and calculating $\hat g$ might involve a mesh with 16³ nodes. In order to evaluate g(x), we use the mesh with 128³ nodes to solve (4) with coefficients x = (a,b), apply the observation sampling G to the solution, and calculate g(x), the sum of squared differences between the data and the sampled solution. To evaluate the approximation $\hat g(a,b)$ , we process exactly same way except we numerically solve the PDE in the coarser mesh with 16³ nodes.

The first approach is more generic and can be used with all kind of problems. However, the second approach is more natural in problems arising from PDEs, and allows the approximation $\hat g$ to be constructed without evaluating g at all.

2. The Pilot Approximation Trajectory algorithm

PAT is related to common contouring algorithms like those built into high-level languages such as MATLAB and R. Section 2.1 sketches those algorithms. Section 2.2 describes the basic PAT algorithm. Sections 2.3 and 2.4 present refinements.

Standard algorithms and PAT both have the same main steps:

(i)
Partition the domain into rectangular cells.
(ii)
Find a cell edge crossed by a piece (a maximally connected subset) of the contour that has not yet been traced.
(iii)
Find a maximal series of edges of adjacent cells crossed by that piece of the contour.
(iv)
Find (or estimate) points at which the piece of the contour intersects each of those edges.
(v)
Connect those points with line segments (trace the piece).
(vi)
Check whether there are additional pieces of the contour. If so, return to the second step; if not, stop.

The largest differences are in how these steps are performed: how an untraced piece of the contour is found, how the edges crossed by each piece are found, how the (approximate) points where the piece intersects those edges are found, whether points on the contour in the interior of cells are found and used to construct the approximate contour, and how the check for additional pieces is performed. PAT also has extra 'quality control' steps to look for pieces of the contour that might have been missed and to estimate the error of the approximation. This step reliably assesses the accuracy of the approximate lower level set even if the assumptions of PAT do not hold.

2.1. A simple contour algorithm

Suppose that g is a continuous function of two real variables. Let $\Delta \equiv [a_1,b_1]\times [a_2,b_2] \subset \mathbb R^2$ . We seek the contour

$\begin{eqnarray*} &&\gamma_{g, \tau} \equiv \{x \in \Delta: g(x)=\tau\}, \end{eqnarray*}$

for a given level τ. An approximation of γ_g,τ is denoted $\hat {\gamma }_{g, \tau }$ . Without loss of generality, we assume that τ = 0, since we can always contour g − τ instead of g. We suppress τ from the notation unless τ ≠ 0. The contour γ_g consists of a union of maximally connected pieces, indexed by i.

Standard algorithms for approximating γ_g start by dividing Δ into N smaller rectangles (cells) {Δ_k}^N_k=1. They also rely on knowing the value of g on the grid defined by the vertices of the cells. They approximate g at points not on the grid by interpolation, typically linear interpolation. Such algorithms rely on assumptions like the following:

(i)
The function g is continuous.
(ii)
The area of γ_g is zero.
(iii)
Each piece of γ_g is either a closed curve or both of its endpoints are on the boundary of Δ.
(iv)
The function g is equal to the level τ at most once on any edge of any cell.
(v)
No piece of γ_g is entirely confined to the interior of a cell.

Assumptions (ii) and (iii) imply that g does not have a local extrema at which the value of g is 0. Assumptions (iv) and (v) ensure that the cells are small enough to find all the pieces of the contour. Aramini (1980) proposes approaches to obviate the need for assumption (iv) by adaptively subdividing cells; see also Suffern (1990). The method of Aramini (1980) requires knowing the approximate magnitude of the Laplacian of g, but we are avoiding the assumption that g is differentiable. Instead, we assume that the cells are sufficiently small that γ_g crosses no edge more than once. The PAT confidence bound on its own error—the area of the symmetric difference between the real and approximate lower level sets—does not depend on any of these assumptions.

Figure 1 sketches how a typical contouring algorithm works. First, the algorithm compares values of g at adjacent vertices until it finds an edge that the contour crosses. When it finds such an edge, the algorithm uses linear interpolation to estimate the point at which the contour intersects that edge. Then the algorithm moves to an adjacent cell, finds an edge of that cell crossed by the contour, and uses linear interpolation to estimate where the contour intersects this second edge. This continues until the algorithm either returns to the cell found in the first step, or reaches the boundary of the domain.

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** How a typical contour algorithm works. The true contour γ_g is the gray curve. The approximate contour $\hat {\gamma }_g$ is the black polygon. The algorithm first finds a cell (Δ₂ in this case) with an edge that the contour intersects (black edges). The algorithm picks one of those edges and approximates where that piece of the contour intersects the edge (black dots) by linear interpolation. The algorithm then moves from edge to edge of the cell to find an adjacent cell with an edge that intersects the contour (here, Δ₆). This continues until the algorithm returns to the cell it started from or reaches the boundary of the domain (in both directions). The intersection points are connected by line segments to trace that piece of the contour. Then the algorithm checks unvisited cells to see whether the contour has other pieces. If so, the procedure repeats until there are no untraced pieces.
Download figure:
Standard image

**Figure 1.** How a typical contour algorithm works. The true contour γ_g is the gray curve. The approximate contour $\hat {\gamma }_g$ is the black polygon. The algorithm first finds a cell (Δ₂ in this case) with an edge that the contour intersects (black edges). The algorithm picks one of those edges and approximates where that piece of the contour intersects the edge (black dots) by linear interpolation. The algorithm then moves from edge to edge of the cell to find an adjacent cell with an edge that intersects the contour (here, Δ₆). This continues until the algorithm returns to the cell it started from or reaches the boundary of the domain (in both directions). The intersection points are connected by line segments to trace that piece of the contour. Then the algorithm checks unvisited cells to see whether the contour has other pieces. If so, the procedure repeats until there are no untraced pieces.
Download figure:
Standard image

If the algorithm returns to the cell found at the first step, this piece of the contour is a closed curve; the points where the contour was estimated to intersect the edges of cells are connected by straight lines (traced) to form a closed polygon that approximates the corresponding piece of γ_g. On the other hand, if the algorithm reaches the boundary of the domain Δ before returning to the first cell, assumption (iii) guarantees that this piece of the contour intersects the boundary of Δ twice. To find the other point where the contour intersects the boundary of Δ, the algorithm returns to the first point found and moves in the opposite direction until it reaches the boundary. Then the points at which this piece of the contour were estimated to intersect cell edges are joined by straight lines (traced) to form a piecewise linear curve that approximates the corresponding piece of γ_g.

The contour might consist of disjoint pieces, so when the algorithm finishes tracing one piece, it checks 'unvisited' cells to see whether the contour has another piece. If so, that piece is traced as described above. This process continues until every cell has been visited. If no edge is crossed by a contour, assumption (v) guarantees that the contour is the empty set.

2.2. Contouring costly functions: Pilot Approximation Trajectory

As mentioned above, PAT is a modification of contouring algorithms like those just described. PAT assumes that g can be evaluated anywhere in Δ. PAT is designed for situations where evaluating g is expensive, so it seeks to keep the number of evaluations small but still approximate γ_g accurately.

PAT uses a pilot approximation $\hat g$ to determine where to evaluate g. For instance, if g involves an expensive simulation or physical experiment, $\hat g$ might be a spline function fitted to values of g at a small number of points in Δ. If computing g requires solving a PDE using a finite-element method, $\hat g$ might be based on a finite-element model using a coarse discretization.

In general, the contours of g and $\hat g$ intersect different cells. To find a cell intersected by γ_g, PAT starts at a point on $\gamma _{\hat g}$ , the contour of $\hat g$ . It searches a line perpendicular to $\gamma _{\hat g}$ for a cell intersected by the contour of g. (It should find such a cell eventually unless γ_g and $\gamma _{\hat g}$ are nearly orthogonal—which would imply that $\hat g$ is a poor approximation to g. In that case, PAT advances to the next edge intersected by $\gamma _{\hat g}$ and tries again.) Once such a cell is found, PAT uses the 'trajectory' (local orientation) of the contour of $\hat g$ to predict that of the contour of g. For instance, if the contour of $\hat g$ intersects the left and top edges of the cell in which the search started, PAT starts by assuming that the contour of g intersects the left and top edges of the cell just found by the search. If that assumption turns out to be false, PAT searches the other edges of the cell crossed by the contour of g to find an edge intersected by γ_g. If the assumption is true, PAT evaluates g fewer times, but if the assumption is false, PAT still works.

Like standard contouring algorithms, PAT divides the domain into cells {Δ_k}^N_k=1. PAT requires both g and $\hat g$ to satisfy assumptions (i)–(v). To simplify the exposition, we assume in this section that the contours of g and $\hat g$ have same number of disjoint pieces; we relax this assumption in section 2.4. And we assume in this section that γ_g and $\gamma _{\hat g}$ each intersect at most two edges of any cell; we relax this assumption in section 2.3.

Below are the steps in the basic PAT algorithm. The cells intersected by the ith piece of the approximate contour are denoted $\{\hat \Delta ^{i}_{j'}\}$ , j' = 0,...,J'_i. Cells intersected by the ith piece of γ_g are denoted {Δⁱ_j}, j = 0,1,...,J_i.

0.
Construct the pilot approximation $\hat g$ of g. Divide the domain Δ into N cells, {Δ_k}^N_k=1.⁴ Set k = i = 1.
1.
Mark cell Δ_k as visited.
2.
- (a)
  If k = N and Δ_k has no edge for which $\hat g(x_1)\hat g(x_2)\leqslant 0$ , where x₁ and x₂ are the vertices of the edge, exit.
- (b)
  Otherwise if k < N and Δ_k has no edge for which $\hat g(x_1)\hat g(x_2)\leqslant 0$ , increment k and return to step 1.
- (c)
  Otherwise Δ_k has an edge with $\hat g(x_1)\hat g(x_2)\leqslant 0$ , so $\gamma _{\hat g}$ intersects Δ_k. The assumptions guarantee that it has at least two such edges. Denote two of them $\hat e^{i}_{0}$ and $\hat e^{i}_{-1}$ . Set $\hat \Delta ^{i}_{0}=\Delta _k$ .
3.
Find an edge intersected by γ_g as follows:
- (a)
  Use linear interpolation to estimate where $\gamma _{\hat g}$ intersects the edges $\hat e^{i}_{0}$ and $\hat e^{i}_{-1}$ . Denote those points by $\hat {x}_{0}^{i}$ and $\hat {x}_{-1}^{i}$ . If $\hat g$ has a zero at a vertex of the cell, we can choose that vertex to be one of the points.
- (b)
  Find the line perpendicular to the line segment between the points $\hat {x}_{0}^{i}$ and $\hat {x}_{-1}^{i}$ that goes through $(\hat {x}_{0}^{i}+\hat {x}_{-1}^{i})/2$ ; see figure 2. Search for a root of g on this line that is not on a piece of the contour already traced (our implementation of PAT uses Müller's method (Müller 1956); other methods might be more efficient). If no new root is found on this line, increment k and return to step 1. If a root is found, denote it xⁱ₀. The point xⁱ₀ is on the piece of γ_g we will trace next.
- (c)
  Find the cell containing xⁱ₀ and denote it Δⁱ₀.
4.
Follow and trace the piece of γ_g just found: set j = j' = 0. Then:
- (a)
  Predict which edge γ_g crosses next. The prediction is denoted e_c. If $\gamma _{\hat g}$ is in the same cell (i.e., has also entered the cell Δⁱ_j+1 through the edge eⁱ_j or, on the first step, started from the same cell), set e_c to be an edge that $\gamma _{\hat g}$ crosses next. Otherwise, let e_c be the edge next crossed by the function $\hat g(\cdot )-\hat g(x_{j}^{i})$ , the pilot approximation $\hat g$ shifted by its value at the last contour point of g, xⁱ_j.
- (b)
  If edges e_c and eⁱ_j are adjacent, they share a vertex at which g is already known. In that case, evaluate g at the other vertex of e_c to check whether γ_g crosses e_c. If it does, set eⁱ_j+1 = e_c.
- (c)
  If e_c is parallel to eⁱ_j, evaluate g at one vertex of e_c (e.g. the vertex with the smaller absolute value of $\hat g$ ). Then g is known at both vertices of one edge perpendicular to eⁱ_j; check whether γ_g crosses that edge. If so, that edge is eⁱ_j+1. If not, evaluate g at the remaining vertex of Δⁱ_j+1 to find the edge eⁱ_j+1 that γ_g intersects.
- (d)
  Find the root xⁱ_j+1 of g on edge eⁱ_j+1.
- (e)
  If $\gamma _{\hat g}$ also crosses edge eⁱ_j+1, trace the portion of $\gamma _{\hat g}$ between the edge $\hat {e}^{i}_{j'}$ and the cell Δⁱ_j+1 the same way the standard algorithm does. Update the index j' and mark the cells $\hat {\Delta }^{i}_{j'}$ intersected by that portion of $\gamma _{\hat g}$ as visited.
- (f)
  Move to the cell that is across edge eⁱ_j+1 and denote it by Δⁱ_j+1.Set j←j + 1 and repeat these steps until either the initial cell Δⁱ₀ or the boundary is reached. If a boundary is reached, return to Δⁱ₀ and trace in the other direction, as the standard contour algorithm does. If the end of $\gamma _{\hat g}$ has not been reached, trace $\gamma _{\hat g}$ to its end and mark all cells on its path as visited. Increment i.
5.
If there are cells not yet visited, increment k until Δ_k is not a visited cell and continue from step 1.
6.
For each i, connect the points {xⁱ₀,xⁱ₁,...,xⁱ_{J_i}} with straight line segments to form the approximation $\hat {\gamma }_{g}$ of γ_g.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Contouring a costly function g using PAT. The true contour γ_g is the dark gray curve and its PAT approximation $\hat {\gamma }_{g}$ is the black polygon. The light gray curve is $\gamma _{\hat g}$ , the approximate contour of the pilot approximation. PAT starts in a cell with an edge intersected by $\gamma _{\hat g}$ (here $\Delta _2=\hat \Delta ^{1}_{0}$ ) and estimates where $\gamma _{\hat g}$ intersects the edges of that cell (dark gray dots) using linear interpolation, just like standard algorithms. Then PAT searches for a root of g (the point x¹₀) on the line perpendicular to the segment between these two intersection points. Tracing of γ_g starts from the cell that contains this zero (here Δ₈ = Δ¹₀). PAT uses the 'trajectory' of $\gamma _{\hat g}$ to make a guess e_c (dark gray lines on edges) of the next edge e²₁ of $\hat \Delta ^{1}_{0}$ that γ_g crosses. Then PAT evaluates g at vertices of the cell (gray dots) as necessary to determine whether e_c or a different edge is in fact e²₁. PAT determines the point x²₁ at which γ_g intersects e²₁ and moves to the next cell γ_g enters. This continues until every piece of the contour has been traced.
Download figure:
Standard image

To approximate contours at additional levels, the pilot approximation $\hat g$ could be reused, or $\hat g$ could be improved by incorporating all the points at which g has been evaluated.

2.3. Saddle points

In the previous section we assumed that the contour intersects at most two edges of any cell. This may not hold if (a) the cell contains a saddle point (the contour crosses itself or two pieces of the contour intersect the same cell) or (b) there is a local minimum or maximum in the cell (so that the same piece of the contour intersects the cell four times, or at three edges and a vertex).

There have been attempts to distinguish these situations automatically, for example by using linear interpolation to decide which edges to connect (Aramini 1980) or by evaluating the function at the center of the cell and comparing that value to the values at vertices (Aramini 1980). Instead, we follow the contour in small steps from one edge to another edge. We believe that this approach errs less frequently in practice, but we have not characterized the conditions under which that is true.

Suppose PAT is at step 4(a) and it turns out that γ_g crosses all four edges of the cell. Then the following approach is used to determine the next edge eⁱ_j+1 and the point xⁱ_j+1. The idea is to follow the contour from xⁱ_j to another edge. One way to do this is to move a short distance along the tangent to γ_g and then move perpendicular to that tangent to get back to γ_g, and repeat these steps until we cross an edge. However, since g is not necessarily differentiable, the tangent to γ_g might not exist. Hence, we modify this prediction–correction approach.

We construct a step direction as follows. Pick a small value δ > 0 (for instance, 10% of length of edge eⁱ_j). Find the unit normal n to edge eⁱ_j. Set k←0, d₀←δn, and y₀←xⁱ_j. While y_k∈Δⁱ_j+1:

1.
Set $\hat y_{k+1} \leftarrow y_k +d_k$ .
2.
On the line through $\hat y_{k+1}$ perpendicular to d_k, find the closest root of g. Set y_k+1 equal to that root.
3.
Set d_k+1 = δ(y_k+1 − y_k) and let k←k + 1.

This approach is illustrated in figure 3.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** Tracking the contour by prediction and correction to find edge e^j_i+1 when a cell contains a saddle point.
Download figure:
Standard image

The edge eⁱ_j+1 is chosen to be the edge crossed by the line segment between y_k and y_k+1. The point xⁱ_j+1 is chosen to be the intersection point on this line segment. The points y₁,...,y_k can be included in the series of points connected to form the approximation $\hat {\gamma }_{g}$ to the ith piece γ_g.

2.4. Missing pieces of the contour

So far, we have assumed that γ_g and $\gamma _{\hat g}$ have the same number of pieces. If $\gamma _{\hat g}$ has more pieces than γ_g, PAT will not succeed in finding a new piece of γ_g when tracing the extra pieces of γ_g, potentially wasting a large number of evaluations of g—but otherwise harmless. On the other hand, if γ_g has more pieces than $\gamma _{\hat g}$ , PAT might not find them all. This can matter in applications. For instance, missing a piece of γ_g can cause a misfit-based confidence set to have a confidence level less than its nominal confidence level, by erroneously excluding parameter values that belong in the set. This section discusses how to augment PAT to make it more likely to find all the pieces of γ_g and how to measure statistically the area of the symmetric difference between the lower level set of g and the estimated lower level set of g, as discussed in section 1.

2.4.1. Heuristic approaches.

In general, we cannot be certain that PAT has found every piece of γ_g. However, some heuristic rules can help locate pieces that might otherwise be missed.

Regions where $\hat {\gamma }_{\hat g}$ is far from γ_g raise suspicion. If $\hat {\gamma }_g$ has disjointed pieces that are close to each other and $\hat g$ is very smooth, $\gamma _{\hat g}$ may have fewer pieces than γ_g. We might discover the extra pieces of γ_g by probing such 'gaps' between γ_g and $\gamma _{\hat g}$ , for example, by evaluating g at the centroid of the gap to see whether g has the sign implied by the PAT contour. If it does not, we can search a line segment between that point and the nearest piece of $\hat {\gamma }_g$ to find a root of g and start tracing a new piece of γ_g there. Our implementation starts by trying to take one endpoint of the segment to be a point within the gap at which g has already been evaluated, close to the centroid of the gap. If g does not have a different value at the centroid, we also evaluate g at the grid point inside the gap at which the value of $\hat g$ is largest or smallest.

If it is inexpensive to find extreme points of $\hat g$ , we might check whether $\hat g$ has extreme points that are not inside of any piece of $\hat {\gamma }_{\hat g}$ or two or more extreme points inside a single piece of $\hat {\gamma }_{\hat g}$ and evaluate g at those points. Checking extreme points relies on the heuristic that if $\hat g$ is a good approximation to g, its extreme points might be close to those of g. If $\hat g$ has a local extremum with a value close to zero, but does not change sign near the extremum, g nonetheless might change sign nearby.

If g has the same sign everywhere it has been evaluated, so PAT has not found any points on the contour, we might use numerical optimization to try to find a point in Δ at which g has the opposite sign. An iterative method could be started from the point with the smallest value of |g| among those observed.

2.4.2. A priori knowledge about g.

In some cases we may know how many pieces γ_g should have. For example, if g is quasi-convex (or quasi-concave), its level set {x:g(x) < 0} is convex (or concave), so γ_g consists of at most one piece and the algorithm can stop as soon as one piece of γ_g has been completely traced.

2.4.3. A priori knowledge about $\vert \hat g - g \vert$ .

If we have a bound on the error of $\hat g$ as an approximation to g, we can use it to check whether PAT has missed pieces of γ_g. For example, if we knew a function M(x) for which

$\begin{eqnarray*} &&\vert g(x)-\hat g(x)\vert \leqslant M(x)\quad\textrm{for all } x\in\mathcal{X}, \end{eqnarray*}$

it would suffice to evaluate g at grid points x_j for which $\vert \hat g(x_j)\vert \leqslant M(x_j)$ (perhaps starting with points for which $\vert \hat g(x_j)\vert M(x_j)^{-1}$ is smallest) and check the sign of g at those points.

2.4.4. Random sampling and contour error.

We can use random sampling both to look for missing pieces and to collect statistical evidence about the accuracy of the PAT contour $\hat {\gamma }_g$ as an approximation to γ_g. Recall from section 1 that the difference between the true lower level set of g at level τ, Λ_τ, and the estimated lower level set of g, $\hat {\Lambda }_\tau$ , is the set on which the approximate contour $\hat {\gamma }_g$ errs in classifying points in Δ. As discussed in the introduction, the approximate contour $\hat {\gamma }_g$ makes classification errors on two sets: points where the true value of g is strictly positive but the approximate contour claims that g is nonpositive, and points where the true value of g is nonpositive but the approximate contour claims that g is positive. The area of the union of those sets—the symmetric difference $\Lambda _\tau \ominus \hat {\Lambda }_\tau$ of Λ_τ and $\hat {\Lambda }_\tau$ —is a measure of the accuracy of the approximate contour. Normalizing that area by the total area of Δ gives the fraction of the area of the domain Δ misclassified by the approximate contour.

To keep things simple, we consider how to estimate the fraction p of the area of Δ that is misclassified by the PAT contour $\hat {\gamma }_g$ —the relative area of the symmetric difference. The chance that a point selected at random uniformly from Δ is misclassified by $\hat {\gamma }_g$ is p. If a point near $\hat {\gamma }_g$ is misclassified, it could mean that $\hat {\gamma }_g$ is slightly out of place. If a point far from $\hat {\gamma }_g$ is misclassified, that suggests that $\hat {\gamma }_g$ is a missing piece of γ_g. To check, we can search for a root of g on any line segment between that point and the nearest piece of $\hat {\gamma }_g$ . If we find such a root, it becomes a starting point to trace a piece of γ_g that had been missed.

On the other hand, suppose we draw n points at random, independently, uniformly from Δ, and the sign of g at every point in the sample agrees with its sign according to the PAT contour $\hat {\gamma }_g$ : no point in the sample is in the symmetric difference $\hat {\Lambda }_\tau \ominus \Lambda _\tau$ . Then an event has occurred that has the probability (1 − p)ⁿ. We can use this to find a 1 − α upper confidence bound on p: for this event to have a probability greater than α, p must be less than 1 − α^1/n. For instance, for n = 50, we would have 95% confidence that P < 5.8%; for n = 100, we would have 95% confidence that P < 3%.

More generally, if n independent draws find m misclassified points, a 1 − α upper confidence bound for the relative area of $\hat {\Lambda }_\tau \ominus \Lambda _\tau$ , the part of the domain Δ misclassified by the PAT contour, is

$\begin{equation} u(n, m, \alpha) \equiv \max \left \{ p: \sum_{i=0}^m p^i (1-p)^{n-i} \geqslant \alpha \right \}. \end{equation} \tag{ 7 }$

Evaluating g at a random sample of points in Δ allows us to have confidence in the accuracy of the PAT contour, despite the fact that the assumptions behind PAT might be wrong—at the expense of n extra evaluations of g.

In some applications, one kind of error is more important than the other. For instance, erroneously including extra points in a confidence set makes the set conservative, while erroneously excluding points from a confidence set is misleadingly optimistic. Moreover, the 'cost' of misclassifying points might be higher for some points than for others. For instance, erroneously excluding from a confidence set of climate models a model in which global temperature rises 10 K in 20 years might be a more serious error than erroneously excluding a model in which global temperature rises 0.5 K in 20 years. Such differences can be taken into account by using weighted areas (integration against a nonuniform measure instead of Lebesgue measure) and carrying out the above sampling using corresponding nonuniform distributions.

3. Tests of Pilot Approximation Trajectory

In this section we apply PAT to two test cases: finding a contour of a two-dimensional function with a saddle point, at the level of the saddle point, and constructing a confidence set in an inverse problem involving air flow.

3.1. Contouring a function with a saddle point

Let g be a sum of 2 two-dimensional Gaussian bumps:

$\begin{eqnarray*} &&g(x)=\mathrm{e}^{-(Rx-\mu_-)^{\mathrm{T}} C (Rx-\mu_-)}+\mathrm{e}^{-(Rx-\mu_+)^{\mathrm{T}} C (Rx-\mu_+)},\quad x\in\mathbb R^2 , \end{eqnarray*}$

where the matrices R and C and the vectors μ_± are given by

$\begin{eqnarray*} R=\left(\begin{array}{@{}cc@{}} \cos(\frac\pi 4) & \sin(\frac\pi 4) \\[6pt] -\sin(\frac\pi 4) & \cos(\frac\pi 4) \end{array} \right),\quad C=\left(\begin{array}{@{}cc@{}} 8 & 0 \\ 0 & 10 \end{array} \right), \quad \mu_\pm=\left(\begin{array}{@{}c@{}} \pm0.6\\0\end{array}\right) . \end{eqnarray*}$

The matrix R rotates the coordinate system 45° clockwise. The domain for the contour plot is Δ ≡ [ − 1,1]². The function g is plotted in figure 4.

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** A function g with a saddle point used to test PAT and its pilot approximation $\hat g$ .
Download figure:
Standard image

This function has a saddle point at the origin. To test the ability of PAT to contour saddle points, we contour at level τ = g(0,0), the value of g at the saddle point.

The pilot approximation $\hat g$ was chosen to be the piecewise linear approximation of g using an equispaced 9 × 9 grid in Δ (see description in section 1) and it is plotted in figure 4.

The PAT contour $\hat {\gamma }_g$ found using a 32 × 32 grid is plotted in figure 5. For comparison, figure 5 shows a contour computed using the built-in contouring algorithm in R with data on a 1024 × 1024 grid and on the 32 × 32 grid. The 1024 × 1024 grid is dense enough that the approximate contour is essentially the true contour of g. Figure 6 is a close-up of the contour near the saddle point.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** Results for example 1: PAT contour $\hat {\gamma }_g$ of g (dots connected with black lines) and standard contour of $\hat g$ (crosses connected with black lines) on a 32 × 32 grid. Contours of g and $\hat g$ computed using a standard algorithm on a 1024 × 1024 grid are plotted as solid light gray curves. A standard contour on a 32 × 32 grid is plotted as dotted dark gray curves.
Download figure:
Standard image

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** Close-up of the contour near the saddle point (the origin). The approximate PAT contour $\hat {\gamma }_g$ of g is plotted as dots connected with black lines; the standard contour is plotted as a solid light gray line (1024 × 1024 grid) and a solid light gray line (32 × 32 grid).
Download figure:
Standard image

The PAT contour is almost indistinguishable from the standard contour on a 1024 × 1024 grid. It is comparable to the standard contour on a 32 × 32 grid except near the saddle point, where the PAT contour is more accurate. PAT correctly finds that the contour crosses itself, even though the contour of the pilot approximation does not. Moreover, PAT is more accurate than the standard contouring algorithm on the 32 × 32 grid, in part because PAT locates zeros of g with a root finder rather than approximating them by linear interpolation. Indeed, if we treat the contour on the 1024 × 1024 grid as ground truth, the standard contouring on a 32 × 32 grid misclassifies 0.38% of the domain, while PAT misclassifies only 0.12% of the domain (the area of the symmetric difference was computed using the gpclib library in R. Repeated random samples of 100 points generally found none that were misclassified by the PAT contour (no sample point was in $\hat {\Lambda }_\tau \ominus \Lambda _\tau$ ). The 95% upper confidence bound on the fraction of Δ misclassified by the PAT contour based on a sample of size 100 that shows no misclassifications is about 3%, rather larger than the true percentage, 0.12%. Occasionally, a random sample of 100 points revealed one point misclassified by the PAT contour; the corresponding upper 95% confidence bound is 4.7%. By sampling more than 100 points, these upper confidence bounds could be reduced. However, the bounds decrease roughly like the reciprocal of the square root of the sample size. For instance, to reduce the 95% upper confidence bound from 3% to 0.2% would require evaluating g at least 1500 times, to reduce the sampling variability in the accuracy assessment—not to improve the contour.

To compute the PAT contour required evaluating g 370 times (the pilot approximation $\hat g$ was evaluated 1109 times). To build the pilot approximation $\hat g$ , g was evaluated 9² = 81 times, so g was evaluated 451 times in all. For comparison, computing the contour on a 32 × 32 grid using a standard contouring algorithm requires 32² = 1024 evaluations of g. In this example, our algorithm requires 56% fewer evaluations of g. Including the 100 evaluations of g at randomly selected points, the total number of evaluations is still 46% less than the standard contouring algorithm requires—and gives a more accurate contour and an internal measure of accuracy, which the standard method lacks. (Of course, the same sampling approach could be used to assess the accuracy of the standard method, at the cost of an additional 100 evaluations of g for parity.) The number of evaluations of g that PAT requires depends on the root-finding algorithm and its tuning constants; we used the algorithm of Müller (1956).

Missing contour pieces.

To test the ability of PAT to find a contour accurately even when the topology of the contour of the pilot approximation is wrong, we shifted the level τ for which the contour is constructed by 0.005 (i.e. τ = g(0,0) + 0.005). This makes the contour γ_g,τ consist of two closed curves. We also shifted the approximation $\hat g$ by 0.045 so that the contour of the pilot approximation $\gamma _{\hat g, \tau }$ is a single closed curve that encircles both pieces of the contour γ_g,τ, so the contour γ_g,τ has more pieces than $\gamma _{\hat g, \tau }$ , the contour of the pilot approximation.

The method described in section 2.4 finds the piece. That method checks for whole-cell 'gaps' between the PAT contour $\hat {\gamma }_{g, \tau }$ and the approximate contour of the pilot approximation, $\hat {\gamma }_{\hat g, \tau }$ . In this example, the missing piece of the PAT contour is inside the approximate contour, so the gaps are outside the PAT contour. Since g is below τ near but outside the PAT contour, the algorithm will detect that a piece of the contour is missing if it finds a point x outside the traced piece for which g(x) > τ. To look for such a point, it evaluates g within the gap at the grid point where $\hat g$ is largest, since, if $\hat g$ is a good approximation of g, we expect g to be large there, too. If g > τ at that point, the algorithm searches the line segment between that point and the closest point at which g has been already evaluated (a vertex of the cell intersected by the contour) to find a point where g = τ. It then traces the (new) piece of the contour that passes through that root of g − τ.

The PAT contour on a 32 × 32 grid is shown in figure 7. Computing the PAT contour required 423 evaluations of g: 8² = 64 to find the pilot approximation and 359 to trace the contour. If we consider the contour on the 1024 × 1024 grid to be the ground truth, the standard contour on a 32 × 32 grid misclassifies 0.33% of the domain, while PAT misclassifies only 0.15% of the domain. Again, PAT gives a more accurate contour than the standard method, using far fewer evaluations of g.

Figure 7. Refer to the following caption and surrounding text. — **Figure 7.** PAT contour for a shifted version of example 1. In this example, the true contour has two pieces but the contour of the pilot approximation has only one. The algorithm identifies the cells marked in black as a 'gap' between $\hat {\gamma }_{\hat g}$ , the approximate contour of $\hat g$ and $\hat {\gamma }_g$ , and the PAT contour of g. The black line segment joins the point at which $\hat g$ is largest and the closest point in the gap at which g has already been evaluated. The algorithm searches that segment for a root of g − τ. The second piece of the PAT contour is traced starting at the root of g − τ on the black line segment. The solid and dotted gray curves are as described in figure 5.
Download figure:
Standard image

**Figure 7.** PAT contour for a shifted version of example 1. In this example, the true contour has two pieces but the contour of the pilot approximation has only one. The algorithm identifies the cells marked in black as a 'gap' between $\hat {\gamma }_{\hat g}$ , the approximate contour of $\hat g$ and $\hat {\gamma }_g$ , and the PAT contour of g. The black line segment joins the point at which $\hat g$ is largest and the closest point in the gap at which g has already been evaluated. The algorithm searches that segment for a root of g − τ. The second piece of the PAT contour is traced starting at the root of g − τ on the black line segment. The solid and dotted gray curves are as described in figure 5.
Download figure:
Standard image

Workload scaling.

To investigate empirically how the number of function evaluations PAT requires scales, we used PAT to contour the saddle function on grids ranging from 20 × 20 to 200 × 200. Table 1 gives the results. PAT's computational advantage over the standard method increases with the dimension of the grid. The number of evaluations of g for the standard method scales like the number of points in the grid, n², while the number of evaluations of g for PAT scales approximately as O(n). The accuracy of PAT is substantially better than that of standard methods on coarse grids, and slightly better even for fine grids.

Table 1. Workload scaling for PAT and standard contouring for various grid dimensions, for the test function in figure 4. 'Saved' is 1 minus the ratio of evaluations of g for PAT to evaluations of g for the standard method. 'Relative error' is the ratio of the area of the symmetric difference of the true and approximate lower level sets for the standard contour to the area of that difference for the PAT contour. Numbers larger than 1.0 indicate that PAT is more accurate than the standard method.

Grid dimension	Evaluations of g		Saved	Relative error
Grid dimension	PAT	Standard	Saved	Relative error
20×20	266+81	400	13%	2.43
30×30	355+81	900	51%	3.20
50×50	512+81	2500	76%	1.49
100×100	910+81	10 000	90%	1.07
200×200	1814+81	40 000	95%	1.06

4. Test case 2: confidence sets for viscosity and thermal conductivity of airflow

An ideal homogeneous gas flows from left to right in a two-dimensional pipe, sketched in figure 8. The gas is heated at the input boundary on the left with a known temperature profile. There is no heat source within the pipe. Once the flow is in a steady state, the temperature of the gas is measured at 101 points shown in figure 8; the measurements have iid additive Gaussian noise with an expected value of 0 and variance of 1. From the boundary condition and the temperature measurements, we want to estimate the viscosity and the thermal conductivity of the gas and to find the uncertainties of the estimates using a confidence set, as described in the appendix.

Figure 8. Refer to the following caption and surrounding text. — **Figure 8.** The computational domain for the gas flow example, a two-dimensional pipe whose diameter doubles abruptly. An ideal gas flows in through Γ_in on the left and out through Γ_out on the right. The inner wall of the pipe is the boundary Γ_wall (not labeled). Boundary conditions are imposed on temperature and velocity at Γ_in, there is no flow across the pipe wall and the velocity through Γ_out is purely horizontal. Steady-state temperature is measured with independent, additive, zero-mean Gaussian noise at the 101 points indicated by crosses.
Download figure:
Standard image

We model the flow using Elmer finite-element software (IT Center for Science (CSC) 2011). The diameter of the pipe has a step, which causes the flow to detach from the wall. If the viscosity is sufficiently small (i.e., if the Reynolds number is large), the flow may become turbulent, a possibility we ignore in this example.

Mathematical models.

The steady-state temperature distribution T(·) of the gas is modeled by the heat equation. Since there are no internal sources, this equation is:

$\begin{equation} \rho C\left(v\cdot\nabla T\right)-\nabla\cdot (\kappa\nabla T)= 0, \end{equation} \tag{ 8 }$

where C is heat capacity, v is the velocity field and κ is thermal conductivity. The velocity field is modeled by the Navier–Stokes equation:

$\begin{equation} \rho(v\cdot\nabla)v-\nabla\cdot(2\mu\epsilon)+\nabla p=\rho h, \end{equation} \tag{ 9 }$

where ρ is density, μ is viscosity, p is pressure, h is gravitational force and is the infinitesimal strain tensor. The velocity and temperature boundary conditions at the input boundary vary quadratically with height. The velocities at the walls are zero and purely horizontal at the output boundary. The temperature at the walls is fixed, and the flow satisfies the Neumann zero boundary condition at Γ_out.

We solved the heat equation and the Navier–Stokes equation as coupled PDEs using Elmer. The function f and its pilot approximation $\skew5\hat{f}$ (see the notation in the appendix) are as follows. The function f maps (x₁,x₂) to HT, where T is the temperature distribution for a fixed viscosity of 10^−x₁ Ns m⁻² and a fixed thermal conductivity of 10⁻²·x₂ W (m K)⁻¹, and H is the matrix that interpolates linearly between points on the computational mesh to calculate T at the measurement points. The domain of f is Δ = [3,4] × [2,3], which we verified experimentally to contain every point in the confidence set. A logarithmic transformation of viscosity improves numerical stability, since the effect of viscosity on velocity and temperature is exponential. We scaled the x₂-axis to make the axes equally long to improve computational efficiency. An approximation $\skew5\hat{f}$ for f is formed in the same way using a very sparse mesh.

Results.

We used PAT to approximate the contour corresponding to the boundary of the 95% χ² confidence set (see the appendix). The function g was defined using (A.2) with p = 2 (χ² confidence sets). The pilot approximation $\hat g$ is obtained similarly by replacing f with $\skew5\hat{f}$ in equation (A.2).

The computational mesh used to evaluate f contained 330 elements and 1099 nodes. For the pilot approximation $\skew5\hat{f}$ , the mesh contained 108 elements and 389 nodes. Elmer becomes unstable in this problem if there are significantly fewer elements than that. The material parameters used in the computations are shown in table 2. PAT contours were computed using a grid of 30 × 30 cells (31 × 31 grid points).

Table 2. Material properties for gas-flow simulations.

Parameter	Value
Viscosity	1/5500 Ns m⁻²
Thermal conductivity κ	2.55×10⁻² W (m K)⁻¹
Heat capacity C	1010 J (kg K)⁻¹
Specific heat ratio	1.4
Reference pressure	10⁵ Pa

The simulated data were computed using Elmer with a dense mesh of 563 elements and 1830 nodes. Independent, identically distributed zero-mean Gaussian noise with standard deviation 2.79 K (2% of the elevation in the temperature) was added to the simulated values.

To assess accuracy of the PAT confidence sets, we also computed the contour using the standard algorithm in R on a 256 × 256 grid. The PAT contours and reference contours are shown in figure 9. The PAT contours are almost indistinguishable from contours computed by standard methods on the dense grid. Differences are visible only in sharp corners where the cell size is too large for a straight line segment to turn as rapidly as the actual contour.

Figure 9. Refer to the following caption and surrounding text. — **Figure 9.** The χ² confidence set. The black dots connected with lines are the PAT contour of the actual misfit function. The crosses are the contour of the pilot approximation for PAT. The solid gray lines are contours computed using a standard algorithm on a 256 × 256 grid.
Download figure:
Standard image

Computing the PAT contour for the χ² confidence set required evaluating the function f 243 times and the function $\skew5\hat{f}$ 1023 times. The number of the evaluations of f was about a quarter of 31² = 961, the minimum number of evaluations a standard contouring method would use to compute the contour on a 31 × 31 grid. Moreover, the PAT contour is more accurate than the standard contouring method on a 31 × 31 grid: the relative area of $\hat {\Lambda }_\tau \ominus \Lambda _\tau$ for the PAT confidence set is 0.06% while the relative area for a standard contour on the same grid is 0.15%.

5. Conclusions

The PAT algorithm can find one or more contours of a function g using fewer evaluations of g than traditional contouring methods, which can entail substantial savings if g is expensive to evaluate. PAT uses an inexpensive pilot approximation of g to predict the location of the contour. This can reduce the number of evaluations of g required to contour g to a given level of accuracy. PAT uses an iterative method to find points on the contour, and takes advantage of the fact that g can be evaluated anywhere, not just on the vertices of a regular grid.

The computational advantage of PAT increases with the dimension of the grid used in the standard method. In examples, the number of function evaluations that PAT requires scales roughly as the square root of the number of grid points, while conventional methods scale linearly with the number of grid points. The savings can be enormous. In our experience, changing the accuracy of the pilot approximation does not affect the workload much unless it changes the number of disjointed pieces that the contour of the pilot approximation has.

Two numerical examples illustrate the utility and efficiency of PAT. The first example is to contour a function that has a saddle point near the contour level. The second example is to find a χ² confidence set for an unknown two-dimensional parameter in a nonlinear inverse problem. In both examples, PAT contours are more accurate than standard contours and require only a fraction as many function evaluations. PAT is particularly useful to contour functions that are expensive to evaluate, for example, when evaluating the function requires conducting a physical experiment or running complex coupled PDE solvers.

Since PAT uses values of g at cell vertices to determine whether the contour crosses an edge, PAT cannot tell whether an edge is intersected more than once. Reducing cell size ameliorates this issue, but increases the number of evaluations of g. A more sophisticated approach might split cells adaptively depending on the local smoothness of g. Aramini (1980) proposes adapting grid size locally using the Laplacian of g or the value of g at the centers of cells. Similar techniques might be used with PAT. For PAT, a criterion for splitting cells might be based on $\hat g$ (e.g., tracing the contour of $\hat g$ on a dense grid to see whether it crosses any edge more than once).

As presented here, PAT is for functions of two variables. PAT could be extended to functions of three variables using an approach similar to Marching cubes (Lorensen and Cline 1987; see also Bhaniramka et al 2004, Schreiner et al 2006). How to extend PAT to dimensions higher than three is an open question.

Acknowledgments

JMJH was supported by the Osk. Huttunen Foundation, the Jenny & Artturi Wihuri Foundation and the Finnish Cultural Foundation, and the Academy of Finland (application number 213476, the Finnish Programme for Centres of Excellence in Research 2006–2011). PBS received support from Lawrence Livermore National Laboratory grant B585264.

Appendix.: Confidence sets based on χ² misfit

This section sketches a standard approach to constructing a confidence set for a collection of parameters in an inverse problem. The approach, sometimes called strict bounds, is not optimal but it is conservative. For more detail, see, for instance, Evans and Stark (2002).

Consider estimating the parameter $x_0 \in \mathcal {X}$ from observations $Y\in \mathbb R^n$ that are related through

$\begin{eqnarray*} &&Y=f(x_0)+\epsilon, \end{eqnarray*}$

where f is a known function, is stochastic observational error and x₀ is an unknown element of the measurable set $\mathcal {X}$ .

The function f maps $x \in \mathcal {X}$ into an n-vector (f₁(x),...,f_n(x))^T. Computing f might involve Monte Carlo simulation or the numerical solution of a PDE by finite-element methods, but we ignore uncertainty in calculating f(x₀) for known x₀. See Stark (1992, 2008) or Kaipio and Somersalo (2005) for a treatment of systematic errors in inverse problems, such as errors that result from approximating f.

The additive error is a random n-vector (₁,...,_n)^T. The error has an expected value of zero and known, finite, positive-definite covariance matrix $\Sigma = {\mathbb E}\epsilon \epsilon ^{\mathrm {T}}$ . Let $\sigma _k = \sqrt {{\mathbb E} \epsilon _k^2} = \Sigma _{kk}^{1/2}$ be the standard deviation of the kth component of . The parameter x₀ is fixed; the only random element in the problem is the error .

Suppose C(·) is a measurable mapping from the data space $\mathbb R^n$ into subsets of the parameter space $\mathcal {X}$ . If C satisfies

$\begin{equation} {\mathbb P}_x \{C(Y) \ni x \}\geqslant 1-\alpha, \; \forall x \in \mathcal{X}, \end{equation} \tag{ A.1 }$

then C is a 1 − α confidence procedure. Inequality (A.1) says that the coverage probability of C(Y ) is at least 1 − α, whatever be $x_0 \in \mathcal {X}$ . If we observe the datum Y =y, then C(y) is a 1 − α confidence set for x₀.

Before the data are observed, C(Y ) is not a fixed set—it is random—and it is meaningful to talk about the chance that C(Y ) will end up containing x₀. In repeated experiments, the fraction of 95% confidence sets that include the true parameter value x₀ converges in probability to 95%, by the Law of Large Numbers. Once the data have been observed, we talk about confidence level rather than coverage probability: nothing random is left. For instance, the assertion

$\begin{eqnarray*} &&{\mathbb P}_{x_0} \{ C(y) \ni x_0\} \geqslant 1-\alpha \end{eqnarray*}$

does not make sense because x₀ is a fixed parameter and C(y) is a fixed set. Either x₀ is in C(y) or not; there is no probability.

We consider confidence sets derived from misfit functions: the confidence set contains all parameter values for which the predicted data agree adequately with the observed data. Agreement is measured in a norm. 'Adequately' is calibrated to make the coverage probability 1 − α.

Let ∥·∥_p,Σ be a p-norm on $\mathbb R^n$ with weights derived from Σ.

For instance, for p = 2 we might define

$\begin{eqnarray*} &&\| y \|_{2, \Sigma} \equiv \sqrt{y^{\mathrm{T}}\Sigma^{-1}y} \end{eqnarray*}$

or, for p = ∞,

$\begin{eqnarray*} &&\| y \|_{\infty, \Sigma} \equiv \max_{k=1, \ldots, n} \vert z_k/\sigma_k\vert. \end{eqnarray*}$

Let χ_p,Σ,α be the smallest value of c for which

$\begin{eqnarray*} &&{\mathbb P} \{ \| \epsilon \|_{p, \Sigma} \leqslant c \} \geqslant 1-\alpha. \end{eqnarray*}$

Then

$\begin{eqnarray*} &&{\mathbb P}_{x} \{ \| f(x) - y\|_{p, \Sigma} \leqslant \chi_{p, \Sigma, \alpha} \} \geqslant 1-\alpha, \; \forall x \in \mathcal{X}. \end{eqnarray*}$

So, if we define

$\begin{eqnarray*} &&C(y) \equiv \left\{x \in \mathcal{X} : \| f(x) - y \|_{p,\Sigma} \leqslant \chi_{p, \Sigma, \alpha} \right\}, \end{eqnarray*}$

then C(Y ) is a 1 − α confidence set for x₀; see, for instance, Stark (1992) for examples and extensions. The case p = 2 is sometimes called a χ² confidence set; the case p = ∞ is sometimes called a maximum modulus confidence set.

As a concrete example, suppose the n components of are independent, iid zero-mean Gaussian random variables with common variance σ². Then Σ = σ²I, where I is the n by n identity matrix; χ_2,Σ,α is σ times the square root of the 1 − α percentage point of the χ² distribution with n degrees of freedom.

For a fixed choice of p and α and a fixed y, define the real-valued function g on $\mathcal {X}$ :

$\begin{equation} g(x) \equiv \| f(x) - y \|_{p, \Sigma} - \chi_{p, \Sigma, \alpha} . \end{equation} \tag{ A.2 }$

A 1 − α confidence set for x₀ based on the observation Y =y is thus

$\begin{eqnarray*} &&C(y) = \{ x \in \mathcal{X} : g(x) \leqslant 0 \}. \end{eqnarray*}$

This is the lower level set of g at level τ = 0. Continuity of g follows from continuity of the p-norm and the assumed continuity of f. Hence, the confidence set—which typically is not simply connected—is closed relative to $\mathcal {X}$ , and the contour γ_g,0 = {x:g(x) = 0} is the boundary of the confidence set.