Brought to you by:
Paper

Sparsity-promoting and edge-preserving maximum a posteriori estimators in non-parametric Bayesian inverse problems

, , and

Published 20 February 2018 © 2018 IOP Publishing Ltd
, , Citation Sergios Agapiou et al 2018 Inverse Problems 34 045002 DOI 10.1088/1361-6420/aaacac

0266-5611/34/4/045002

Abstract

We consider the inverse problem of recovering an unknown functional parameter $u$ in a separable Banach space, from a noisy observation vector $y$ of its image through a known possibly non-linear map ${{\mathcal G}}$ . We adopt a Bayesian approach to the problem and consider Besov space priors (see Lassas et al (2009 Inverse Problems Imaging 3 87–122)), which are well-known for their edge-preserving and sparsity-promoting properties and have recently attracted wide attention especially in the medical imaging community.

Our key result is to show that in this non-parametric setup the maximum a posteriori (MAP) estimates are characterized by the minimizers of a generalized Onsager–Machlup functional of the posterior. This is done independently for the so-called weak and strong MAP estimates, which as we show coincide in our context. In addition, we prove a form of weak consistency for the MAP estimators in the infinitely informative data limit. Our results are remarkable for two reasons: first, the prior distribution is non-Gaussian and does not meet the smoothness conditions required in previous research on non-parametric MAP estimates. Second, the result analytically justifies existing uses of the MAP estimate in finite but high dimensional discretizations of Bayesian inverse problems with the considered Besov priors.

Export citation and abstract BibTeX RIS

1. Introduction

We consider the inverse problem of recovering an unknown functional parameter $u\in X$ from a noisy and indirect observation $y\in Y.$ We work in a framework in which $X$ is an infinite-dimensional separable Banach space, while $ \newcommand{\R}{{{\mathbb R}}} Y=\R^J$ . In particular, we consider the additive noise model

Equation (1)

where ξ is mean zero Gaussian observational noise, with a positive definite covariance matrix $ \newcommand{\R}{{{\mathbb R}}} \Sigma\in\R^{J\times J}$ , $\xi\sim N(0,\Sigma).$ Here the possibly nonlinear operator $ \newcommand{\G}{\mathcal{G}} \G:X\to Y$ describes the system response, connecting the observation $y$ to the unknown parameter $u$ . More specifically, $ \newcommand{\G}{\mathcal{G}} \G$ captures both the forward model and the observation mechanism and is assumed to be known.

Inverse problems are mathematically characterized by being ill-posed: the lack of sufficient information in the observation prohibits the unique or stable reconstruction of the unknown. This can either be due to an inherent loss of information in the forward model, or due to the incomplete and noisy observation. To address ill-posedness, regularization techniques are employed in which the available information in the observation is augmented using a priori available knowledge on the properties of candidate solutions.

We adopt a Bayesian approach to the regularization of inverse problems, which has in recent years attracted enormous attention in the inverse problems and imaging literature: see the early work [29], the books [37, 69] and the more recent works [16, 46, 66] and references therein. In this approach, prior information is encoded in the prior distribution $ \newcommand{\p}{\partial} \newcommand{\pr}{\mu_0} \pr$ on the unknown $u$ , and the Bayesian methodology is used to formally obtain the posterior distribution $ \newcommand{\p}{\partial} \newcommand{\post}{\mu^y} \post$ on $u\vert y$ , in the form

Equation (2)

where

Our work is driven by two types of a priori information: on the one hand, we aim to recover unknown functions with a blocky structure, as is typically the case in image processing [23, 55] and medical imaging applications [22]. On the other hand, we are interested in prior models which promote sparse solutions, that is high-dimensional solutions that can be represented by a small number of coefficients in an appropriate expansion. To achieve these effects, we utilize so-called Besov space priors [17, 47], which are well-known for their edge-preserving and sparsity-promoting properties, see e.g. [9, 35, 44, 50, 58]. In the core of these priors is the fact that they employ a wavelet basis and use $ \newcommand{\e}{{\rm e}} \ell^1$ -type regularization on the corresponding coefficients, an idea rooted in the extensive statistical literature developed by Donoho, Johnstone, Elad, Candes and others, see for example [13, 14, 20, 24, 25, 36, 52, 64], which is also the central idea in the field of compressive sensing. With these goals in mind, our work relates to classical $L^1$ -type regularization methods such as penalized least squares with total variation penalty [60, 65]. Other Bayesian approaches which promote sparse and blocky solutions include the hierarchical methods extensively studied by Calvetti and Somersalo, see for example [4, 1012].

In practice, the implementation of the Bayesian approach to an inverse problem typically requires a high-dimensional discretization of the unknown and large computational resources. From a computational perspective, it is hence imperative that the probability models and estimators related to finite-dimensional discretizations of the problem scale well with respect to refining discretization. In particular, there is a fundamental need to understand whether specialized prior information used frequently in applications, for example of the type described above, leads to well-defined non-parametric probability models and whether the related finite-dimensional estimators have well-behaving limits; this motivates the study of Bayesian inverse problems in the infinite-dimensional function-space setting. A body of literature on this topic has emerged during the last years, centered around two theories:

  • (a)  
    the discretization invariance theory, see [4648] and references therein, which aims to ensure that when investing more resources in increasing the discretization level, the prior remains faithful to the intended information on the unknown function and the posterior converges to a well defined limit which is an improved representation of the reality;
  • (b)  
    the well-posedness of the posterior theory found in [16, 66], which secures that the posterior is well defined in the infinite-dimensional limit and robust with respect to perturbations in the data as well as approximations of the forward model.

Our work studies Bayesian inversion in function spaces especially from the perspective of point estimators. Namely, we study and give a rigorous meaning to maximum a posteriori (MAP) estimates for Bayesian inverse problems with certain Besov priors in the infinite-dimensional function-space setting. A MAP estimate is understood here as the mode of the posterior probability measure, hence our results require a careful definition of a mode in the infinite-dimensional setting. The main challenge is then to establish a connection between the topological definition of a MAP estimate (mode of the posterior) and an explicit variational problem. Succeeding in doing so, opens up the possibility of studying the behaviour of MAP estimators in certain situations. In particular, we are able to prove a weak form of consistency of the MAP estimator in the infinitely informative data limit.

1.1. Non-gaussian prior information and the need for MAP estimators

A major challenge in Bayesian statistics is the extraction of information from the posterior distribution. In Gaussian-conjugate settings, such as linear inverse problems, on the one hand there are explicit formulae for the posterior mean which can be used as an estimator of the unknown, and on the other hand one can (in principle) sample directly from a discretized version of the posterior, [1, 49, 53, 54, 56]. Nevertheless, draws from Gaussian priors do not vary sufficiently sharply on their domain to have a blocky structure, neither do they give rise to sparse estimators.

For this purpose, the so-called TV prior has been used widely in applied literature, e.g. in medical imaging [45, 63]. Drawing intuition from the classical regularization literature, the TV prior has a formal density of the form

where $\alpha>0$ is a (hyper)parameter. Here, the norm of bounded variation, $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \norm{u}_{BV}$ , can be formally thought of as the $L^1$ -norm of the derivative of $u$ . However, the numerical implementation of Bayesian inversion with a TV prior miss-behaves as the discretization level of the unknown increases, and in particular the TV prior is not discretization invariant, [44, 47, 48]. For example, depending on the choice of the parameter α as a function of the discretization level, the posterior mean either diverges or converges to an estimate corresponding to a Brownian bridge prior. For alternative types of approaches to edge-preserving non-parametric Bayesian inversion, see [26, 31, 33].

We consider the family of Besov-priors which were proposed and shown to be discretization invariant in [47]. A well-posedness theory of the posterior was developed in [17]. These priors are defined by a wavelet expansion with random coefficients, motivated by a formal density of the form

Equation (3)

where $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \norm{\cdot}_{B_{p}^s}=\norm{\cdot}_{B_{pp}^s}$ is the Besov space norm with regularity parameter $s$ and integrability parameters $p$ . For $p=2$ the Besov space $B_{2}^s$ corresponds to the Sobolev space of functions with $s$ square-integrable derivatives, $H^s$ , and the corresponding family of priors are Gaussian with Sobolev-type smoothness parametrized by $s$ . We are especially interested in the case $p=1$ , which is highly interesting for edge-preserving and sparsity-promoting Bayesian inversion, [44]. For the case $s=1$ this is due to the close resemblance between the way $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \norm{u}_{B_{1}^1}$ and $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \norm{u}_{BV}$ work, both dealing with the $L^1$ -norm of a (generalized) derivative of $u$ , see [44, section 2]. We define rigorously the class of Besov priors for $p=1$ and $ \newcommand{\R}{{{\mathbb R}}} s\in\R$ , called $B^s_1$ -Besov priors, in section 3 below.

To probe the posterior in the non-conjugate context of linear or nonlinear inverse problems with Besov priors, one typically resorts to Markov chain Monte Carlo (MCMC) methods. Unfortunately, in practice standard MCMC algorithms become prohibitively expensive for large scale inverse problems, consider e.g. photo-acoustic tomography [57, 70]. This is due to the heavy computational effort required for solving the forward problem which is needed for computing the acceptance probability at each step of the chain.

In such situations maximum a posteriori estimates are computationally attractive as they only require solving a single optimization problem. Furthermore, it was shown in [44] that for the $B^1_1$ -Besov prior defined via wavelet bases, at finite discretization levels the resulting MAP estimators are sparse and preserve the locations of the edges in the unknown. This remains true as discretization is refined, while our work validates that, unlike in the TV prior case, the discretized MAP estimators converge to the MAP estimators of the limiting infinite dimensional posterior distribution.

1.2. Review of the main results

Our results draw inspiration from previous papers by the authors [18, 32] (see also [26]), where concepts that we will call strong and weak MAP estimates were coined. We quote both definitions in section 2 below.

As mentioned earlier, we can formally think of a MAP estimator as a mode of the posterior. In finite dimensional contexts, especially when working with continuous probability distributions, the definition of a mode is straightforward as a maximizer of the probability density function. If the prior has probability density function of the form

for a suitable positive function $W:X\to [0,\infty)$ , the density of the posterior is

where

Equation (4)

In this case, a MAP estimator can be interpreted as a classical estimator of the unknown, arising from Tikhonov regularization with penalty term given by the negative log-density of the prior, [28].

In the infinite dimensional setting things are less straightforward due to the lack of a uniform reference measure. An intuitive approach to define a mode in a function space $X$ is as follows: compute the measure of balls with any center $u\in X$ and a fixed radius $ \newcommand{\e}{{\rm e}} \epsilon>0$ and proceed by letting epsilon tend to zero. A mode $\hat u$ is a center point maximizing these small ball probabilities asymptotically (as epsilon decreases) in a specific sense. What distinguishes a strong mode from a weak mode is exactly how 'asymptotic maximality' is perceived:

  • (a)  
    in the strong mode case, we look for the maximum probability among all centres in $X$ ;
  • (b)  
    in the weak mode case, we look for a centre of a ball with the property that it has maximum probability among all shifts of the ball by elements of a dense subspace $E\subset X$ .

Natural choices of $E$ turn out to be spaces of zero probability, thus giving one interpretation for the term 'weak'. Note that weak and strong modes coincide for $E=X$ . Each of the two notions of mode gives rise to a notion of MAP estimator termed strong and weak MAP estimators, respectively.

Let $ \newcommand{\e}{{\rm e}} B_\epsilon(z)\subset X$ denote the open ball of radius epsilon, centered at $z\in X$ . If we can find a functional $I$ defined on an appropriate dense subspace $F\subset X$ , such that

Equation (5)

when $ \newcommand{\p}{\partial} \newcommand{\post}{\mu^y} \mu=\post$ , then for any fixed $z_1\in F$ , a $z_2\in F$ maximising the limit or equivalently minimising $I$ is a potential MAP estimator. For weak MAP estimators, if the space $E\subset F$ over which we shift the ball centres, can be chosen sufficiently regular so that

Equation (6)

again for $ \newcommand{\p}{\partial} \newcommand{\post}{\mu^y} \mu=\post$ , then it is straightforward to establish the equivalence of weak MAP estimators and the minimisers of $I$ . In the case of strong MAP estimators one needs to work considerably more, the difficulty stemming from the 'smallness' of the subspace $F\subset X$ with respect to the prior hence also the posterior.

If a functional $I$ satisfying (5) exists, it is called the (generalised) Onsager–Machlup functional, [18, 27, 32, 34]. As for the limit in (6), it is identified with $R^\mu_h$ , the Radon-Nikodym derivative of the shift of μ by $h\in E$ with respect to μ itself, provided this derivative has a continuous representative, see [32, lemma 2] (quoted in lemma 2.3 below). In finite dimensions, both the existence of the Onsager–Machlup functional and the statement in (6) follow from the Lebesgue differentiation theorem under very mild conditions on Φ and the density of the prior. In the Banach-space setting, establishing (5) and (6) for the posterior boils down to showing similar results for the prior given that the posterior is obtained by a suitably regular transformation of the prior.

Strong MAP estimators were defined and studied in [18], in the context of nonlinear Bayesian inverse problems with Gaussian priors. In this case, an expression for the limit in (5) was readily available from the Gaussian literature for $F$ being the Cameron–Martin space of $\mu_0$ . The identification of minimisers of the resulting Onsager–Machlup functional with strong MAP estimators, however, required a considerable amount of work, in particular many new estimates involving small ball probabilities under the prior.

Weak MAP estimators were defined and studied in [32], in the context of linear Bayesian inverse problems with a general class of priors. The authors used the tools from the differentiation and quasi-invariance theory of measures, developed by Fomin and Skorohod [6], to connect the zero points of $\beta_h^\mu$ , the logarithmic derivative of a measure μ in the direction $h$ , to the minimisers of the Onsager–Machlup functional. An essential assumption that makes this possible is the continuity of $\beta_h^\mu$ over $X$ for sufficiently regular $h$ . The authors considered as examples Besov priors with integrability parameter $p>1$ and a conditionally Gaussian hierarchical prior with a hyper-prior on the mean.

In both [18] and [32], the Onsager–Machlup functional was shown to be a Tikhonov-type functional as in (4), where $W$ is the formal negative log-density of the prior, hence MAP estimators are identified with the corresponding Tikhonov approximations in the studied contexts.

In this work, we are interested in defining and studying both the strong and the weak MAP estimates for generally nonlinear inverse problems with $B^s_1$ -Besov priors, that is for Besov priors with integrability parameter $p=1$ . Since the prior has formal density as in (3), we expect and indeed show that MAP estimates are identified with minimizers of the Tikhonov-type functional

Equation (7)

In particular, the weak and strong MAP estimates coincide. For the considered prior, the general theory developed in [32] for weak MAP estimators does not apply, due to the fact that the logarithmic derivative of the $B^s_1$ -Besov priors is inherently discontinuous. We will show that the continuity of the Radon-Nikodym derivative $R^\mu_h$ , for $h$ in a suitable subspace $E\subset X$ , is sufficient to get the result for weak MAP estimators. For the $B^s_1$ -Besov priors, we prove the continuity of $R^\mu_h$ for shifts $ \newcommand{\T}{\mathbb{T}} h\in E\subseteq B^r_1(\T^d)$ for any $r>s$ , establish appropriate small ball probability ratio asymptotics, and show the validity of (5) over $ \newcommand{\T}{\mathbb{T}} F=B^s_1(\T^d)$ . It is then straightforward to prove the identification of weak MAP estimators by the minimisers of the functional $I$ in (7). Furthermore, we generalise the program of [18] to the assumed non-Gaussian case, developing some properties of maximizers of small ball probabilities along the way, and show that strong MAP estimators are identified by the minimisers of $I$ in (7). We also consider the theory of local weak modes separately, relying on the Fomin literature.

One natural question arising from our work is under what conditions strong and weak MAP estimators coincide. This question is investigated in the very recent work [51], where the authors give general conditions for the equivalence of the two notions. The proofs of our results as presented below, aptly illustrate that, in practice, it is preferred to work with weak MAP estimators.

1.3. Consistency of MAP estimators

Under the frequentist assumption of data generated from a fixed underlying true $u^\dagger$ , it is desirable to verify that in the infinitely informative data limit, the Bayesian posterior distribution $ \newcommand{\p}{\partial} \newcommand{\post}{\mu^y} \post$ contracts optimally to a Dirac distribution centered on $u^\dagger$ . In recent years, there have been many studies on the rates of posterior contraction in the context of Bayesian inverse problems. The case of linear inverse problems with Gaussian and conditionally Gaussian priors is now well understood [1, 2, 39, 4143, 68], and a theory for linear inverse problems with non-Gaussian priors is also being developed [40, 59]. For nonlinear inverse problems the asymptotic performance of the posterior is not yet fully understood, with some partial contributions being [56, 71, 72].

Note that for general nonlinear problems, especially with finite dimensional data as in the present paper, one cannot expect to recover the underlying truth $u^\dagger$ in the infinitely informative data limit. Instead the aim is to recover a $u^\ast\in X$ such that $ \newcommand{\G}{\mathcal{G}} \G(u^\ast)=\G(u^\dagger)$ . Posterior contraction rates for Bayesian inverse problems with Besov priors are studied in ongoing work of a subset of the authors. A form of weak consistency of the strong MAP estimator in the presence of repeated independent observations, for general nonlinear inverse problems with Gaussian priors, was shown in [18]. In the present paper, we prove a similar result for the strong (hence the weak) MAP estimator obtained using the Besov prior for $p=1$ .

1.4. Notation

Throughout the paper we assume that $X$ is a separable Banach space equipped with the Borel σ-algebra. All probability measures are assumed to be Borel measures. The Euclidean norm in $ \newcommand{\R}{{{\mathbb R}}} \R^J$ is denoted by $\vert \cdot\vert $ to distuinguish it from the norm of any general $X$ denoted by $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \norm{\cdot}_X$ . We write $ \newcommand{\p}{\partial} \newcommand{\pr}{\mu_0} f\propto g$ for two functions $ \newcommand{\R}{{{\mathbb R}}} f,g:X\to \R$ if there exists a universal constant $ \newcommand{\R}{{{\mathbb R}}} c\in\R$ such that $f = cg$ as functions.

Definition 1.1. A measure μ is called quasi-invariant along $h$ , if the translated measure $\mu_h(\cdot) := \mu(\cdot-h)$ is absolutely continuous with respect to μ. We define

which is readily verified to be a linear subspace.

Notation 1.2. Let $h\in Q(\mu)$ . We denote the Radon–Nikodym derivative of $\mu_h$ with respect to μ by $R_h^\mu \in L^1(\mu)$ .

1.5. Organization of the paper

This paper is organized as follows: in section 2 we discuss the definition of modes for probability measures on separable Banach spaces. We introduce novel localized versions of the modes studied in previous work and discuss briefly how different modes can be characterized for log- or quasi-concave measures. The Besov priors are introduced and discussed in section 3 including the key results relating to the Radon–Nikodym derivative of the Besov prior in section 3.2. Section 4 covers the Bayesian inverse problem setup and our main results related to identification of weak and strong MAP estimates as the minimizers of certain variational problem. Moreover, the weak consistency result is given in section 4.2. In section 5 we discuss the logarithmic derivative of the posterior and its use in characterizing the MAP estimates. Finally, all proofs are postponed to section 6.

2. Modes of measures on Banach spaces

In section 2.1 we introduce the two existing notions of maximum a posteriori estimator (modes of the posterior measure) proposed in [18, 32] in the context of measures on infinite-dimensional spaces. We also define two new notions of local modes and hence local MAP estimates. In section 2.2, we focus on log-concave measures, study the structure of the set of modes, and give conditions for local modes to be global.

2.1. Weak and strong, global and local

The following definition of a mode, introduced in [18], grows out of the idea that highest small ball probabilities are obtained asymptotically at the mode.

Definition 2.1. Let $ \newcommand{\e}{{\rm e}} M^\epsilon = \sup_{u\in X} \mu(B_\epsilon(u))$ . We call a point $\hat u\in X$ a mode of the measure μ, if it satisfies

A mode of the posterior measure $ \newcommand{\p}{\partial} \newcommand{\post}{\mu^y} \post$ in (2), is called a maximum a posteriori (MAP) estimate.

Below we occasionally use the terms strong mode and strong MAP estimator for the concepts introduced in definition 2.1 in order to distinguish them from the following weaker notion of a mode (similarly, the weak mode or weak MAP) introduced in [32].

Definition 2.2. Let $E$ be a dense subspace of $X$ . We call a point $\hat u \in X$ , $\hat u \in {\rm supp} (\mu)$ , a weak mode of μ if

Equation (8)

for all $h\in E$ . A weak mode of the posterior measure $ \newcommand{\p}{\partial} \newcommand{\post}{\mu^y} \post$ in (2), is called a weak maximum a posteriori (wMAP) estimate.

Notice that the definition of a weak mode is dependent on the choice of the subspace $E$ . Therefore, in some contexts it may be more appropriate to discuss $E$ -weak modes. In the following, however, we will suppress this dependence since the key question to our study is whether such a space exists.

The notions of weak and strong mode are related as follows: any strong mode is a weak mode for the choice $E=X$ [32, lemma 3], which is straightforward to see by simply estimating $ \newcommand{\e}{{\rm e}} \mu(B_\epsilon(\hat u - h)) \leqslant M^\epsilon$ . The key motivation to study the weak definition, is the case when small ball asymptotics are not available explicitly or only available in some subspace of translations $h$ . It is then of interest to choose $E$ so that an expression for the limit on the left hand side of (8) exists pointwise. This typically leads to choices of $E$ which have zero probability with respect to μ. The following lemma, which is an immediate generalization of [32, lemma 2], provides further guidance for this choice.

Lemma 2.3. Assume that μ is quasi-invariant along the vector $h$ . Let $A\in\mathcal{B}(X)$ be convex, bounded and symmetric and define $ \newcommand{\e}{{\rm e}} \newcommand{\diam}{{\epsilon}} A^\diam:=\diam A$ . Suppose $R_h^\mu$ has a continuous representative $\tilde R_h^\mu \in C(X)$ , i.e. $R_h^\mu - \tilde R_h^\mu = 0$ in $L^1(\mu)$ . Then it holds that

for any $u \in X$ .

Remark 2.4. According to lemma 2.3, it is desirable to consider a subspace $E$ , such that $R_h^\mu$ is continuous for $h\in E$ . A sufficient condition for the continuity of $R_h^\mu$ was given in [32], namely the so-called logarithmic derivative of μ along $h$ needed to be continuous and exponentially integrable with respect to μ. For the sparsity promoting measure that we will consider in this paper, the logarithmic derivative is inherently discontinuous (section 5). We are however able to show the continuity of $R_h^\mu$ over an appropriate subspace $E$ in section 3.2 by using its explicit expression.

Both strong and weak mode can also be associated with a natural localization described by the following definitions:

Definition 2.5 (Local modes). Let $\hat u \in X$ be such that $\hat u \in {\rm supp} (\mu)$ .

  • (1)  
    We call $\hat u$ a local mode of the measure μ, if there exists a $\delta>0$ such that the quantity $ \newcommand{\e}{{\rm e}} M^\epsilon_\delta = \sup_{u\in B_\delta(\hat u)} \mu(B_\epsilon(u))$ satisfies
  • (2)  
    We call $\hat u$ a local weak mode of μ if there exists $\delta>0$ such that
    Equation (9)
    for all $h\in B^X_\delta(0) \cap E$ .

Local modes represent an analogue to local maxima of the probability density function in finite dimensions. In the setting of [32] the local wMAP coincides with the zero points of the logarithmic derivative of the posterior. Especially, regularization techniques in non-linear inverse problems are often known to give birth to local maxima and, therefore, the local wMAP can give statistical interpretation for these points.

2.2. Modes for log-concave measures

This work studies the Besov prior which is a prototypical example of a log-concave measure. Below we show some general properties regarding the modes of such class of measures. Most of these ideas naturally extend to a larger class called the quasi-concave measures.

Definition 2.6. A probability measure μ in $(X, \mathcal{B}(X))$ is called logarithmically-concave or log-concave, if

for all $A,B\in \mathcal{B}(X).$ Moreover, μ is called quasi-concave if

for all $A,B\in \mathcal{B}(X).$

It is straightforward to verify that any log-concave measure is also quasi-concave. An immediate result of quasi-concavity is the well-known Anderson inequality [3, 8].

Proposition 2.7. Let μ be a symmetric quasi-concave measure on $X$ . For any symmetric and convex set $A\subset X$ we have

The next result follows from the Anderson inequality.

Proposition 2.8. Suppose that the measure μ on $X$ is symmetric around $u$ and quasi-concave. Then $u$ is a strong mode of μ.

Let us next consider briefly the structure of the set of strong modes of a quasi-concave measure μ. When working in finite dimensions, $ \newcommand{\R}{{{\mathbb R}}} X=\R^d$ , a probability density function $f$ with respect to the Lebesgue measure is called quasi-concave if for all $ \newcommand{\R}{{{\mathbb R}}} x,y\in\R^d$ and all $ \newcommand{\la}{\langle} \lambda\in[0,1]$ we have

Clearly a quasi-concave probability density function has a convex set of global modes. For a reference on convexity and unimodality in finite dimensions see [21]. We show that a similar result holds in infinite dimensions for our definition of strong mode. The weak mode case is covered in [32].

Proposition 2.9. Suppose that the measure μ in $X$ is quasi-concave (but not necessarily symmetric). Then the set of strong modes is convex.

It turns out log-concavity is a sufficient condition for the global and local modes to coincide.

Theorem 2.10. Suppose μ is log-concave and $\hat u$ is a local mode. Then $\hat u$ is also a global mode. Similarly, if $\hat u$ is a local weak mode then it is also a global weak mode.

3. Besov priors with $ \boldsymbol{p=1} $

The family of Besov priors has been introduced in [47] and studied in [17, 32]. In section 3.1 we recall the definition and some useful properties of Besov priors with integrability parameter $p=1$ and regularity parameter $s>0$ , termed $B^s_1$ -Besov priors, on which we focus in this work. We also present some straightforward convexity properties of $B^s_1$ -Besov priors. The main results of this section are listed in section 3.2, where we compute the Radon-Nikodym derivative $R_h^\mu$ for $B^s_1$ -Besov priors, determine the space $E$ in which $h$ needs to live in order for the corresponding Radon-Nikodym derivative $R_h^\mu$ to be continuous, and finally show that $I_0(u)=\Vert u\Vert _{B^s_1}$ is the Onsager–Machlup functional for $B^s_1$ -Besov measures.

3.1. Definition and basic properties

We work with periodic functions on a $d$ -dimensional torus, $ \newcommand{\T}{\mathbb{T}} \T^d$ . We first define the periodic Besov spaces $ \newcommand{\T}{\mathbb{T}} B^s_{pq}(\T^d)$ , where $ \newcommand{\R}{{{\mathbb R}}} s\in\R$ parametrises smoothness and $p,q\geqslant 1$ are integrability parameters. We concentrate on the case $p=q$ and write $B^s_p=B^s_{pp}$ . To define the Besov spaces, we let $ \newcommand{\p}{\partial} \newcommand{\e}{{\rm e}} \{\psi_\ell\}_{\ell=1}^\infty$ be an orthonormal wavelet basis for $ \newcommand{\T}{\mathbb{T}} L^2(\T^d)$ , where we have utilized a global indexing. We can then characterise $ \newcommand{\T}{\mathbb{T}} B^s_{p}(\T^d)$ using the given basis in the following way: the function $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\T}{\mathbb{T}} f:\T^d\to\R$ defined by the series expansion

Equation (10)

belongs to $ \newcommand{\T}{\mathbb{T}} B^s_{p}(\T^d)$ , if and only if the norm

Equation (11)

is finite. Throughout, we assume that the basis is $r$ -regular for $r$ large enough in order to consist a basis for a Besov space with smoothness $s$ , [19].

We now follow the construction in [47] to define periodic Besov priors corresponding to $p=1$ , using series expansions in the above wavelet basis with random coefficients. Notice also the work [30] on defining Besov priors for functions on the full space $ \newcommand{\R}{{{\mathbb R}}} \R^d$ .

Definition 3.1. Let $ \newcommand{\e}{{\rm e}} (X_\ell)_{\ell=1}^\infty$ be independent identically distributed real-valued random variables with the probability density function

Equation (12)

Let $U$ be the random function

Then we say that $U$ is distributed according to a $B^s_1$ -Besov prior.

The next lemma determines the smoothness of functions drawn from the $B^s_1$ -Besov prior and shows the existence of certain exponential moments.

(47, lemma 2).

Lemma 3.2 Let $U$ be as in definition 3.1 and let $t<s-d$ . Then it holds that

  • (i)  
    $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \norm{U}_{B^t_{1}} < \infty$ ,   almost surely, and
  • (ii)  
    $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \newcommand{\expec}{\mathbb{E}} \newcommand{\e}{{\rm e}} \expec \exp(\frac 12 \norm{U}_{B^t_{1}}) < \infty$ .

Notation 3.3. We denote by $ \newcommand{\e}{{\rm e}} \rho_\ell$ the probability measure of the random variable $ \newcommand{\e}{{\rm e}} \ell^{-s/d-1/2}X_\ell$ on $ \newcommand{\R}{{{\mathbb R}}} \R$ . We identify the random function $U$ in definition 3.1 with the product measure of coefficients $ \newcommand{\e}{{\rm e}} (\ell^{-s/d-1/2}X_\ell)_\ell$ in $ \newcommand{\R}{{{\mathbb R}}} (\R^\infty,{{\mathcal B}}(\R^\infty))$ , which we denote by $ \newcommand{\la}{\langle} \newcommand{\e}{{\rm e}} \newcommand{\bi}{\boldsymbol}\lambda=\bigotimes_{\ell=1}^\infty \rho_\ell$ .

We next consider the convexity of the $B^s_1$ -Besov prior.

Lemma 3.4. For any $s>0$ , the $B^s_1$ -Besov measure λ is logarithmically concave.

An immediate consequence of the last lemma, is that by proposition 2.7, the $B^s_1$ -Besov prior satisfies Anderson's inequality. Notice that lemma 3.4 (and hence proposition 2.7) also holds for $B^s_p$ -Besov measures with $p>1$ , as defined in [47].

3.2. Radon–Nikodym derivative $ {R^\mu_h} $ and small ball probabilities

Recall the definitions of quasi-invariance for a measure μ and of the subspace of directions in which μ is quasi-invariant, $Q(\mu)$ . In general the structure of $Q$ is not known and there are even examples of measures for which $Q$ fails to be locally convex [6, exercise 5.5.2]. The space $Q$ is known to be a Hilbert space for certain families of measures, for example for α-stable measures with $\alpha\geqslant 1$ and for countable products of a single distribution with finite Fisher information, see [6, theorem 5.2.1] and [62] (note that the $B^s_1$ -Besov prior, $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl$ , is not an α-stable measure). Using similar techniques to [62], namely the Kakutani–Hellinger theory, we now show that $Q$ is a Hilbert space also for the $B^s_1$ -Besov measure λ and calculate the Radon-Nikodym derivative $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} R_h^\gl$ between $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl$ and the shifted measure $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl_h$ for $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} h\in Q(\gl)$ .

Lemma 3.5. For the $B^s_1$ -Besov measure we have $ \newcommand{\T}{\mathbb{T}} \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} Q(\gl)=B_2^{s-\frac{d}2}(\T^d)$ . For $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} h\in Q(\gl)$ we have

in $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} L^1(\R^\infty,\gl)$ , where $ \newcommand{\ga}{\alpha} \newcommand{\e}{{\rm e}} \ga_\ell=\ell^{s/d-1/2}$ .

We next provide a more detailed view of spaces of shifts $h$ for which $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} R_h^\gl$ has a continuous representative, which we denote by $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \newcommand{\tr}{\tilde R} \tr_h^\gl(u)$ . This is a crucial result for our study of weak MAP estimators, see lemma 2.3, remark 2.4 and the discussion in section 1.2.

Lemma 3.6. Let $ \newcommand{\N}{{{\mathbb N}}} \newcommand{\T}{\mathbb{T}} \newcommand{\p}{\partial} \newcommand{\gs}{r} \newcommand{\e}{{\rm e}} h=\sum_{\ell\in\N}h_\ell\psi_\ell\in B^{\gs}_1(\T^d)$ with $ \newcommand{\gs}{r} \gs>s$ . Then $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \newcommand{\ga}{\alpha} \newcommand{\tr}{\tilde R} \newcommand{\e}{{\rm e}} \newcommand{\bi}{\boldsymbol}\tr_h^\gl(u)=\exp\sum_{\ell=1}^\infty$ $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \newcommand{\ga}{\alpha} \newcommand{\tr}{\tilde R} \newcommand{\e}{{\rm e}} \newcommand{\bi}{\boldsymbol}(-\ga_\ell\vert h_\ell-u_\ell\vert +\ga_\ell\vert u_\ell\vert \big)$ is continuous with respect to $ \newcommand{\N}{{{\mathbb N}}} \newcommand{\T}{\mathbb{T}} \newcommand{\p}{\partial} \newcommand{\e}{{\rm e}} u=\sum_{\ell\in\N}u_\ell\psi_\ell\in B^t_1(\T^d)$ for any $t<s-d$ .

Note that in the expression for the Radon-Nikodym derivative $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \newcommand{\tr}{\tilde R} \tr_h^\gl(u)$ , $h\in E$ and the less regular $u\in X$ are coupled component-wise (in the wavelet basis defining the Besov measure) and hence establishing the continuity of $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \newcommand{\tr}{\tilde R} \tr_h^\gl(u)$ with respect to $u$ is not straightforward. See section 6 for the proof of above lemma.

We record the following immediate corollary of the last lemma and lemma 2.3.

Corollary 3.7. Let $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} h\in B^\gs_1(\T^d)$ , $ \newcommand{\gs}{r} \gs>s$ . Then it holds that

Equation (13)

for any $u\in B_1^t$ .

Remark 3.8. Let $ \newcommand{\T}{\mathbb{T}} X=B^t_1(\T^d)$ for $t<s-d$ . By remark 2.4 and noting that the Besov space $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} B^\gs_1(\T^d)$ is dense in $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ for any $ \newcommand{\gs}{r} \gs>t$ , the last corollary shows that it is natural to choose $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} E=B^\gs_1(\T^d)$ for any $ \newcommand{\gs}{r} \gs>s$ in the definition of wMAP estimate. It also shows that the origin is the unique weak mode and, therefore, by proposition 2.8 also the unique strong mode.

We also remark that each term of the series in the right-hand side of (13) consists of the difference of two terms, whose respective series are not convergent in general for $ \newcommand{\T}{\mathbb{T}} u\in B^t_1(\T^d)$ and $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} h\in B^\gs_1(\T^d)$ . However, they coincide with the difference of the norms in $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ if $h$ and $u$ are both elements of $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ . Based on this view we close this section with an important building block for the study of the MAP estimate. It extends the last corollary to $ \newcommand{\T}{\mathbb{T}} h\in B^s_1(\T^d)$ , for balls centered at the origin.

Theorem 3.9. Suppose that $ \newcommand{\T}{\mathbb{T}} h\in B^s_1(\T^d)$ and $t<s-d$ . Let $ \newcommand{\T}{\mathbb{T}} A\in\mathcal{B}(B^t_1(\T^d))$ denote a convex and zero-centered symmetric and bounded set. Then

It follows immediately from the above theorem that for $ \newcommand{\T}{\mathbb{T}} z_1,z_2\in B^s_1(\T^d)$ ,

Equation (14)

giving the Onsager–Machlup functional of $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl$ . The space $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ here, is the largest space on which the Onsager–Machlup functional is defined. This is the space $F$ of the discussion in section 1.2 in the case of Besov priors.

It is also worth noting the difference between (13) and (14). The centres of the balls in (13) are in $ \newcommand{\T}{\mathbb{T}} X=B^t_1(\T^d)$ , the continuity of the right-hand side in $u$ is due to sufficient regularity of the shift $h$ . In (14), the centres of the balls are more regular than (13), but the shift is less regular in general, i.e. only in $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ .

4. Characterization of MAP estimates and their weak consistency

In this section, building on what we have shown on properties of $B^s_1$ -Besov priors, we first show the existence of weak and strong MAP estimates in Bayesian inverse problems with such priors, and identify both with the minimisers of the Onsager–Machlup functional. Using this characterization, we then prove a weak consistency result for MAP estimators.

4.1. Identification of MAP estimates for Bayesian inverse problems

We consider the inverse problem of estimating a function $ \newcommand{\T}{\mathbb{T}} u \in X=B^t_1(\T^d)$ from a noisy and indirect observation $ \newcommand{\R}{{{\mathbb R}}} y\in \R^J$ , modelled as

Equation (15)

Here $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\T}{\mathbb{T}} \newcommand{\G}{\mathcal{G}} \G : B^t_1(\T^d) \to \R^J$ is a locally Lipschitz continuous, possibly non-linear operator and ξ is Gaussian observational noise in $ \newcommand{\R}{{{\mathbb R}}} \R^J$ , $\xi\sim N(0,\Sigma)$ for a positive definite covariance matrix $ \newcommand{\R}{{{\mathbb R}}} \Sigma\in \R^{J\times J}$ .

We assume that $u$ is distributed according to the $B^s_1$ -Besov measure λ defined in section 3, so that $ \newcommand{\T}{\mathbb{T}} \newcommand{\la}{\langle} \lambda(B^t_1(\T^d))=1$ for $t<s-d$ . Under the assumption of local Lipschitz continuity of $ \newcommand{\G}{\mathcal{G}} \G$ , it follows that almost surely with respect to $y$ the posterior distribution $\mu^y$ on $u\vert y$ , has the following Radon–Nikodym derivative with respect to the prior λ:

Equation (16)

where

Equation (17)

and $Z(y)$ is the normalization constant. Indeed, local Lipschitz continuity of $ \newcommand{\G}{\mathcal{G}} \G$ implies measurability of $\Phi(\cdot, y)$ with respect to λ; it also, together with non-negativity of Φ, gives the finiteness and non-singularity of $Z$ . For details see [16].

Note that the majority of the results presented below hold for more general data models. Indeed, the explicit form of the potential Φ is used only for the consistency results contained in section 4.2.

Following the intuition described in section 1.2, we define the Tikhonov-type functional

Equation (18)

The existence of minimizers for $I$ is classical, however, we include the proof for completeness.

Lemma 4.1. The functional $I(\cdot; y)$ in equation (18) has a minimizer $ \newcommand{\T}{\mathbb{T}} \hat u \in B^s_1(\T^d)$ .

The underpinning of our main results in this section, theorems 4.3 and 4.6, are lemma 3.6 and theorem 3.9. In lemma 3.6, for a $B^s_1$ -Besov prior, we showed the existence of a subspace over which the limit of the translated small ball probability ratios has a continuous representative, and in theorem 3.9 we established the Onsager–Machlup functional for such prior measures. We now note that, by the local Lipschitz continuity of Φ, it follows directly that the posterior inherits the above properties of the prior:

Proposition 4.2. Let $ \newcommand{\T}{\mathbb{T}} A\in\mathcal{B}(B^t_1(\T^d))$ be convex, bounded and symmetric and define $ \newcommand{\e}{{\rm e}} \newcommand{\diam}{{\epsilon}} A^\diam:=\diam A$ .

  • (i)  
    For any $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} h\in B^\gs_1(\T^d)$ with $ \newcommand{\gs}{r} \gs>s$ the mapping
    is a continuous function of $ \newcommand{\T}{\mathbb{T}} u\in B^t_1(\T^d)$ .
  • (ii)  
    The Tikhonov-type functional defined in (18) is the generalized Onsager–Machlup functional for the posterior $ \newcommand{\p}{\partial} \newcommand{\post}{\mu^y} \post$ in (16). That is, for any $ \newcommand{\T}{\mathbb{T}} z_1, z_2\in B^s_1(\T^d)$ ,

The theorems below show that the weak and strong MAP estimates of the posterior $ \newcommand{\p}{\partial} \newcommand{\post}{\mu^y} \post$ in (16), are identified with minimisers of functional $I$ . In particular, the weak and strong MAP estimates coincide for the inverse problems considered here.

Theorem 4.3. An element $ \newcommand{\T}{\mathbb{T}} u \in B^s_1(\T^d)$ minimizes $I(\cdot; y)$ if and only if it is a weak MAP estimate for the posterior measure $ \newcommand{\p}{\partial} \newcommand{\post}{\mu^y} \post$ in (16).

We note that the last result implies the existence of weak MAP estimates by lemma 4.1. We next show the existence of strong MAP estimators, and that any strong MAP estimate is a minimiser of $I$ . The following result also shows that the modes can be approximated arbitrarily closely in $B^t_1$ by the centres of balls of fixed sufficiently small radius with maximal probability.

Proposition 4.4. Consider the measure $\mu^y$ given by (16) and (17) with the $B^s_1$ -Besov prior $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl$ and $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\T}{\mathbb{T}} \newcommand{\G}{\mathcal{G}} \G:B^t_1(\T^d)\to \R^J$ locally Lipschitz for $t<s-d$ .

  • (i)  
    For any $ \newcommand{\gd}{\delta} \gd>0$ there exists $ \newcommand{\T}{\mathbb{T}} \newcommand{\gd}{\delta} z^\gd\in B^t_1(\T^d)$ satisfying $ \newcommand{\gd}{\delta} z^\gd=\arg\max_{z\in X}\mu^y(A^\gd+z)$ , where $A^\delta:=\delta A$ with $A$ a convex, symmetric and bounded set in $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ .
  • (ii)  
    There is a $ \newcommand{\T}{\mathbb{T}} \bar z\in B^s_1(\T^d)$ and a subsequence of $ \newcommand{\gd}{\delta} \{z^\gd\}_{\gd>0}$ which converges to $\bar{z}$ strongly in $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ .
  • (iii)  
    The limit $\bar z$ is a strong MAP estimator and a minimizer of $I(u;y)$ in equation (18).

Corollary 4.5. Under the conditions of proposition 4.4, the mapping $ \newcommand{\map}{u_{MAP}} \newcommand{\gd}{\delta} u\mapsto \mu^y(A^\gd+u)$ is continuous in $B^t_1(\mathbb{T}^d)$ .

In the following theorem we prove that any minimizer of $I$ is a strong MAP estimate and, hence, show the identification of strong MAP estimates and minimizers of $I$ using part (iii) of proposition 4.4.

Theorem 4.6. Suppose that conditions of proposition 4.4 hold. Then the strong MAP estimators of $\mu^y$ are characterised by the minimizers of the Onsager–Machlup functional $I$ given in (18).

The proof of theorem 4.3 concerning weak MAP estimates is relatively straightforward and relies on lemma 3.6, i.e. the ability to consider the subspace $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} B^\gs_1(\T^d)$ , where the Radon–Nikodym derivative $R_h^{\mu^y}$ has a continuous representative. The proof of proposition 4.4 related to strong MAP estimates is more involved and requires a series of results developing asymptotic estimates for the small ball probability ratios (lemma 6.2 to 6.4 in section 6.3).

The difference in difficulty of the proofs related to the two notions of MAP estimates highlights the flexibility of weak MAP estimators. It seems that explicit calculations are typically required for the proof in the case of strong MAP estimates. For practical purposes, it is very interesting to find general conditions under which the two MAP estimate concepts coincide.

Remark 4.7. Proposition 4.3.8 in [6] shows that if Φ is convex in $u$ , then since by lemma 3.4 the $B^s_1$ -Besov prior λ is logarithmically-concave, the posterior $\mu^y$ is also logarithmically-concave and hence quasi-concave. In that case proposition 2.9 shows that the set of modes is convex. The convexity of Φ depends on the forward operator $ \newcommand{\G}{\mathcal{G}} \G$ : if for example $ \newcommand{\G}{\mathcal{G}} \G$ is linear then Φ is convex, however in the general nonlinear case Φ may be non-convex.

4.2. Weak consistency of the strong MAP

We consider the frequentist setup, in which

for a fixed underlying value of the unknown functional parameter $ \newcommand{\utr}{u^{\dagger}} \utr\in X$ . As before, we assume that $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\G}{\mathcal{G}} \G:X\to\R^J$ is locally Lipschitz and $ \newcommand{\R}{{{\mathbb R}}} \Sigma\in \R^{J\times J}$ is a positive definite matrix.

For this set of data and with a $B^s_1$ -Besov prior, $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl$ , the posterior measure satisfies

Equation (19)

Here also the Lipschitz continuity of $ \newcommand{\G}{\mathcal{G}} \G$ implies the well-definedness $\mu^{y_1,y_2,\dots,y_n}$ [16]. Proposition 4.4 then implies that the strong MAP estimator of the above posterior measure is a minimizer of

Equation (20)

Theorem 4.8. Suppose that $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\T}{\mathbb{T}} \newcommand{\G}{\mathcal{G}} \G:B^t_1(\T^d)\to\R^J$ is locally Lipschitz and $ \newcommand{\T}{\mathbb{T}} \newcommand{\utr}{u^{\dagger}} \utr\in B^s_1(\T^d)$ . For each $n\in N$ , let $u_n$ denote a minimizer of $I_n$ given in (20). Then there exists $ \newcommand{\T}{\mathbb{T}} u^*\in B^s_1(\T^d)$ and a subsequence of $\{u_n\}$ such that $u_n\to u^*$ in $ \newcommand{\T}{\mathbb{T}} \newcommand{\ts}{{\tilde s}} B^\ts_1(\T^d)$ almost surely for any $ \newcommand{\ts}{{\tilde s}} \ts<s$ . For any such $u^*$ we have $ \newcommand{\G}{\mathcal{G}} \newcommand{\utr}{u^{\dagger}} \G(u^*)=\G(\utr)$ .

If $ \newcommand{\utr}{u^{\dagger}} \utr$ lives only in $ \newcommand{\T}{\mathbb{T}} X=B^t_1(\T^d)$ , and not necessarily in $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ , we can only get the convergence of $ \newcommand{\G}{\mathcal{G}} \{\G(u_n)\}$ :

Corollary 4.9. Let $ \newcommand{\G}{\mathcal{G}} \G$ and $u_n$ , $ \newcommand{\N}{{{\mathbb N}}} n\in\N$ , satisfy the assumptions of theorem 4.8 and suppose that $ \newcommand{\T}{\mathbb{T}} \newcommand{\utr}{u^{\dagger}} \utr\in B^t_1(\T^d)$ . Then $ \newcommand{\N}{{{\mathbb N}}} \newcommand{\G}{\mathcal{G}} \{\G(u_n)\}_{n\in\N}$ converges to $ \newcommand{\G}{\mathcal{G}} \newcommand{\utr}{u^{\dagger}} \G(\utr)$ in probability.

Remark 4.10. Theorem 4.8 states that the true solution is identified in the range of $ \newcommand{\G}{\mathcal{G}} \G$ , which is the natural objective also in regularization theory [28]. We remark the following:

  • (i)  
    The full identification of $ \newcommand{\utr}{u^{\dagger}} \utr$ is dependent on further properties of $ \newcommand{\G}{\mathcal{G}} \G$ , e.g. injectivity of $ \newcommand{\G}{\mathcal{G}} \G$ would immediately yield $ \newcommand{\utr}{u^{\dagger}} u^* = \utr$ .
  • (ii)  
    An inspection of the proof of theorem 4.8 shows that for an injective operator $ \newcommand{\G}{\mathcal{G}} \G$ , we have convergence in probability of the full sequence $ \newcommand{\N}{{{\mathbb N}}} \{u_n\}_{n\in \N}$ to $ \newcommand{\utr}{u^{\dagger}} \utr$ . Indeed, for such a $ \newcommand{\G}{\mathcal{G}} \G$ , assuming that the full sequence does not converge in probability contradicts equation (41).
  • (iii)  
    It is an immediate consequence of the corollary 4.9 that there exists a subsequence of $ \newcommand{\N}{{{\mathbb N}}} \newcommand{\G}{\mathcal{G}} \{\G(u_n)\}_{n\in\N}$ converging to $ \newcommand{\G}{\mathcal{G}} \newcommand{\utr}{u^{\dagger}} \G(\utr)$ almost surely.

5. Connections to logarithmic derivative

In this section we discuss the logarithmic derivative of the posterior measure. We mainly revisit known results (see e.g. [6]) and also derive the logarithmic derivative of the posterior $\mu^y$ given in (16). The intuition behind logarithmic derivative is that it roughly corresponds to the Gâteaux derivative of the posterior potential $I$ . If the logarithmic derivative is smooth, then its zero points can determine the weak MAP estimates as shown in [32]. The Besov $B^s_1$ -prior does not meet this criteria due to the discontinuity of its logarithmic derivative at the origin as we show in theorem 5.8. The logarithmic derivative also determines the Radon–Nikodym derivative as recorded in proposition 5.3 below. Moreover, it can be used as a basis of Newton-type algorithms to estimate the weak MAP in case an explicit form of the potential $I$ is not easily accessible, see e.g. Cauchy priors in [67].

Definition 5.1. A measure μ on $X$ is called Fomin differentiable along the vector $h$ if, for every set ${{\mathcal A}}\in {{\mathcal B}}(X)$ , there exists a finite limit

Equation (21)

It is well-known that if μ is Fomin differentiable along $h$ then the limit $d_h\mu$ is a countably additive signed measure on ${{\mathcal B}}(X)$ and has bounded variation [6]. Moreover, $d_h \mu$ is absolutely continuous with respect to μ.

We denote the domain of differentiability by

Equation (22)

Definition 5.2. The Radon–Nikodym density of the measure $d_h\mu$ with respect to μ is denoted by $\beta^\mu_h$ and is called the logarithmic derivative of μ along $h$ .

(6, proposition 6.4.1).

Proposition 5.3 Suppose μ is a Radon measure on a locally convex space $X$ and is Fomin differentiable along a vector $h\in X$ . If it holds that $ \newcommand{\e}{{\rm e}} \exp(\epsilon \vert \beta^\mu_h(\cdot)\vert) \in L^1(\mu)$ for some $ \newcommand{\e}{{\rm e}} \epsilon>0$ , then μ is quasi-invariant along $h$ and the Radon–Nikodym density $R_h^\mu$ of $\mu_h$ with respect to μ satisfies the equality

Equation (23)

Remark 5.4. Recall that as discussed in remark 2.4, it is desirable to choose $E$ in the definition of wMAP to be a subspace $E\subset X$ such that $R_h^\mu$ , $h\in E$ , has a continuous representative $\tilde R_h^\mu$ . Therefore, the integral $\int_0^1\beta^\mu_h(u-sh){\rm d}s$ in (23) has a measurable representative, which is continuous outside the set $\{u \in X \; \vert \; \tilde R_h^\mu(u)=0\}$ . Moreover, a weak mode $\hat u$ of μ can be equivalently defined by condition

for all $h\in E$ .

The construction of the Besov prior in definition 3.1 is a prototypical example of a product measure. By setting $ \newcommand{\p}{\partial} \newcommand{\la}{\langle} \newcommand{\e}{{\rm e}} \lambda_\ell = \frac{1}{a_\ell} \pi_X\left(\frac x{a_\ell}\right) {\rm d}x$ for $ \newcommand{\e}{{\rm e}} a_n = \ell^{-\left(\frac sd - \frac 12\right)}$ we can define the probability law of the Besov prior on $ \newcommand{\R}{{{\mathbb R}}} (\R^\infty,{{\mathcal B}}(\R^\infty))$ by $ \newcommand{\la}{\langle} \newcommand{\e}{{\rm e}} \lambda = \otimes_{\ell=1}^\infty \lambda_\ell$ . For product measures, the Fomin differentiability calculus reduces to finite dimensional projections in a straightforward manner.

(6, proposition 3.4.1 (iii)).

Lemma 5.5 Let μ be a probability measure on $ \newcommand{\R}{{{\mathbb R}}} (\R, {{\mathcal B}}(\R))$ . Then μ is Fomin differentiable along $h\neq 0$ if and only if has an absolutely continuous density $ \newcommand{\p}{\partial} \pi_\mu$ whose derivative satisfies $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\p}{\partial} \newcommand{\pr}{\mu_0} \pi_\mu^\prime \in L^1(\R)$ . In this case, $ \newcommand{\p}{\partial} \newcommand{\pr}{\mu_0} d_1 \mu = \pi_\mu^\prime {\rm d}x$ .

The Fomin differentiability of a product measure $\mu = \otimes_{n=1}^\infty \mu_n$ on the space $ \newcommand{\p}{\partial} \newcommand{\pr}{\mu_0} X = \prod_{n=1}^\infty X_n$ assigned with the product topology is characterized by the following theorem:

(6, proposition 4.1.1.).

Proposition 5.6 Suppose that $\beta^{\mu_n}_{h_n}$ is the logarithmic derivative of $\mu_n$ in the direction $h_n\in X_n$ . The following claims are equivalent:

  • (i)  
    μ is differentiable along $h = (h_j)_{j=1}^\infty \in X$ ,
  • (ii)  
    the series $\sum_{n=1}^\infty \beta_{h_n}^{\mu_n}$ converges in the norm of $L^1(\mu)$ and

Let us introduce the following subspace

which has a natural Hilbert space structure [6, section 5]. Surprisingly, for a large class of product measures $H(\mu)$ coincides with $D(\mu)$ . The following proposition follows from [7, corollary 2] and [6, example 5.2.3].

Proposition 5.7. Suppose $ \newcommand{\p}{\partial} \mu = \pi(x) {\rm d}x$ is a Borel probability measure on the real line such that

If we set $\mu_n(A) = \mu(A/a_n)$ , where $a_n>0$ , and $\mu = \otimes_{n=1}^\infty \mu_n$ , then it follows that

Let us record the following direct consequence of proposition 5.3: if $t>0$ then

Equation (24)

in $L^1(\mu)$ . It is rather easy to see that $ \newcommand{\la}{\langle} \lambda_1$ (and consequently $ \newcommand{\la}{\langle} \newcommand{\e}{{\rm e}} \lambda_\ell$ for any $ \newcommand{\N}{{{\mathbb N}}} \newcommand{\e}{{\rm e}} \ell\in\N$ ) is Fomin differentiable since

for any $ \newcommand{\R}{{{\mathbb R}}} {{\mathcal A}}\in {{\mathcal B}}(\R)$ . In more generality, this follows from lemma 5.5 since the density function is absolutely continuous.

Theorem 5.8. Let λ be the $B^s_1$ -Besov measure given in definition 3.1. The set of differentiability is given by $ \newcommand{\T}{\mathbb{T}} \newcommand{\la}{\langle} D(\lambda) = B^{s-\frac d2}_{2}(\T^d)$ and for any $ \newcommand{\p}{\partial} \newcommand{\la}{\langle} \newcommand{\e}{{\rm e}} h = \sum_{\ell=1}^\infty h_\ell \phi_\ell \in D(\lambda)$ we have

where

is the logarithmic derivative of $ \newcommand{\e}{{\rm e}} \rho_\ell$ (see notation 3.3).

Notice that the previous theorem directly states that for any $ \newcommand{\T}{\mathbb{T}} h\in B_{1}^{s}(\T^d)$ the logarithmic derivative is bounded $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \newcommand{\la}{\langle} \vert \beta^\lambda_h(u)\vert \leqslant C \norm{h}_{B_{1}^{s}}$ λ-almost surely.

For the posterior distribution $\mu^y$ we can solve the logarithmic derivative by using properties of the prior and the functional Φ. The following result follows directly from [6, proposition 3.3.12] (see also [26, theorem 5.7]).

Theorem 5.9. Suppose that $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\T}{\mathbb{T}} \Phi : B^t_1(\T^d) \to \R$ , with $t<s-d$ , is bounded from below and possesses a uniformly bounded derivative. Then we have $ \newcommand{\p}{\partial} \newcommand{\la}{\langle} \beta^{\mu^y}_h = -\partial_h \Phi(u) - \beta^\lambda_h(u)$ for any $ \newcommand{\la}{\langle} h\in D(\lambda)$ .

6. Proofs

6.1. Proofs of results in section 2

Proof of proposition 2.8. Without loss of generality assume that μ is symmetric around the origin and show that the origin is a strong mode. For any $u\in X$ the Anderson inequality (proposition 2.7) implies $ \newcommand{\e}{{\rm e}} \mu(B_\epsilon(u)) \leqslant \mu(B_\epsilon(0))$ , and so $ \newcommand{\e}{{\rm e}} \mu(B_\epsilon(0))=\sup_{u\in X}\mu(B_\epsilon(u))$ and the origin is a strong mode of μ. □

Proof of proposition 2.9. Suppose $u_1, u_2$ are strong modes. For $\kappa\in(0,1)$ we show that $\hat{u}=\kappa u_1+(1-\kappa)u_2$ is also a strong mode. For $ \newcommand{\e}{{\rm e}} \epsilon>0$ define $ \newcommand{\e}{{\rm e}} M_\epsilon=\sup_{u\in X}\mu(B_\epsilon(u))$ . By quasi-concavity and the identity

Equation (25)

we have that

so that since $u_1, u_2$ are strong modes we get

Since for all $ \newcommand{\e}{{\rm e}} \epsilon>0$ we have $ \newcommand{\e}{{\rm e}} \mu(B_\epsilon(\hat{u}))/M_\epsilon\leqslant 1$ , we get that $\hat{u}$ is a strong mode. □

Proof of theorem 2.10. Let us consider the identity (25) with values $u_1 = \hat u - h$ and $u_2 = \hat u$ . By applying log-concavity we have

and, consequently,

Equation (26)

for any $0\leqslant \kappa \leqslant 1$ .

Now suppose $\hat u \in X$ is local mode in a neighborhood $B_\delta(\hat u)$ but not a global mode. Then there exists $\delta>0$ such that $\hat u$ is a local mode in the neighborhood $B_{\delta}(\hat u)$ but not in $B_{\delta +1}(\hat u)$ . That is, in the larger neighborhood $B_{\delta +1}(\hat u)$ we have for some $ \newcommand{\e}{{\rm e}} \eta > 0$ that there exists such a subsequence $ \newcommand{\e}{{\rm e}} \{\epsilon_j\}_{j=1}^\infty$ that

Now let us choose a sequence $\{u_j\}_{j=1}^\infty \subset B_{\delta+1}(\hat u)$ that

Then it follows that

Since $\hat u$ is a local mode there exists $ \newcommand{\e}{{\rm e}} \tilde \epsilon > 0$ such that for any $ \newcommand{\e}{{\rm e}} \epsilon < \tilde \epsilon$ we have

for $\tilde \delta = \frac \delta{2(1+\delta)}$ and some $ \newcommand{\e}{{\rm e}} \eta > 0$ . In fact, for this choice of $\tilde \delta$ we have $\hat u - \tilde \delta (\hat u-u_j) \in B_\delta(\hat u)$ and it follows by (26) for any $ \newcommand{\e}{{\rm e}} \epsilon_j < \tilde \epsilon$ that

This yields a contradiction and proves the claim for strong MAP estimates.

Suppose now that $\hat u \in X$ is a local wMAP but not a global wMAP. Assume like above that $B_\delta(\hat u)$ is the maximal neighborhood, where $\hat u$ is a local wMAP. Then there exists an element $h\in E$ and $h\notin B_\delta(\hat u)$ such that

for $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \tilde \delta = \frac{\delta }{2\norm{h}_X}$ , since $\hat u - \tilde \delta h \in B_\delta(\hat u)$ . Again we see that the inequality (26) yields a contradiction. This completes the proof. □

6.2. Proofs of results in section 3

Proof of lemma 3.4. Let $ \newcommand{\la}{\langle} \newcommand{\e}{{\rm e}} \newcommand{\bi}{\boldsymbol}\lambda^N=\bigotimes_{\ell=1}^{N}\rho_\ell$ . Then it is straightforward to check that $ \newcommand{\la}{\langle} \lambda^N$ converges weakly to λ as $N\to\infty$ . By [8, theorem 2.2], for λ to be logarithmically concave, it suffices to show that the measures $ \newcommand{\la}{\langle} \lambda^N$ in $ \newcommand{\R}{{{\mathbb R}}} \R^N$ are logarithmically concave. Note that the measures $ \newcommand{\la}{\langle} \lambda^N$ have density, denoted $ \newcommand{\p}{\partial} \pi_N$ , with respect to the Lebesgue measure in $ \newcommand{\R}{{{\mathbb R}}} \R^N,$ given by

for all $ \newcommand{\R}{{{\mathbb R}}} x=(x_1,..,x_N)\in\R^N$ , where $ \newcommand{\ga}{\alpha} \newcommand{\e}{{\rm e}} \ga_\ell=\ell^{\frac{s}d-\frac12}$ are the coefficients in the expansion defining the $B^s_1$ -Besov measure. Since the density $ \newcommand{\p}{\partial} \pi_N$ is a logarithmically concave function, by [5, theorem 1.8.4] $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl^N$ is logarithmically concave and the result follows. □

Proof of lemma 3.5. We use the Kakutani–Hellinger theory [15, chapter 2]. We start by calculating the Hellinger integrals $ \newcommand{\e}{{\rm e}} H(\rho_{h,\ell},\rho_{\ell})$ , where $ \newcommand{\e}{{\rm e}} \rho_{h,\ell}(\cdot):=\rho_{\ell}(\cdot-h_\ell)$ :

By [15, lemma 2.5], we have

where $ \newcommand{\p}{\partial} \newcommand{\ga}{\alpha} \newcommand{\pr}{\mu_0} \newcommand{\e}{{\rm e}} H_N=\prod_{{\ell}=1}^N {\rm e}^{-\frac{\ga_\ell}2{\vert h_{\ell}\vert }}\left(1+\frac{\ga_\ell}2{\vert h_{\ell}\vert }\right)\in (0,1]$ . By taking the negative logarithm we get

By [15, theorem 2.7] the set $ \newcommand{\la}{\langle} Q(\lambda)$ coincides with the set of $h$ such that $ \newcommand{\la}{\langle} -\log\left(H(\lambda_h,\lambda)\right)<\infty.$

The Taylor theorem implies the upper and lower bounds

Using the lower bound, we get that

which implies that a sufficient condition for the equivalence of $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl_h$ and $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl$ is

Equation (27)

Using the upper bound, and letting $ \newcommand{\ga}{\alpha} \newcommand{\e}{{\rm e}} x_{\ell}=\frac{\ga_\ell}2\vert h_{\ell}\vert $ , we get that

If $ \newcommand{\e}{{\rm e}} x_{\ell}$ is unbounded, then the sum on the right hand side is infinite and we have that $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl_h$ and $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl$ are singular (note that if $ \newcommand{\e}{{\rm e}} x_{\ell}$ is unbounded then obviously condition (27) does not hold). If $ \newcommand{\e}{{\rm e}} x_{\ell}$ is bounded, $ \newcommand{\e}{{\rm e}} x_{\ell}\leqslant M$ , then

therefore condition (27) is also necessary for the equivalence of the measures $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl_h$ and $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl$ .

Note, that condition (27) is equivalent to $ \newcommand{\T}{\mathbb{T}} h\in B^{s-\frac{d}2}_2(\T^d)$ . Observing that

for all $ \newcommand{\R}{{{\mathbb R}}} x\in \R$ and $ \newcommand{\N}{{{\mathbb N}}} \newcommand{\e}{{\rm e}} \ell \in\N$ , the claimed expression for $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \frac{{\rm d}\gl_h}{{\rm d}\gl}$ follows from [15, theorem 2.7]. □

Proof of lemma 3.6. We have

Equation (28)

if $ \newcommand{\e}{{\rm e}} \vert u_\ell\vert \leqslant \vert h_\ell\vert $ and

Equation (29)

if $ \newcommand{\e}{{\rm e}} \vert u_\ell\vert > \vert h_\ell\vert $ . Fix $ \newcommand{\e}{{\rm e}} \eta>0$ and consider $ \newcommand{\T}{\mathbb{T}} v\in B^t_1(\T^d)$ such that $ \newcommand{\e}{{\rm e}} \Vert v-u\Vert _{B^t_1}< \eta$ . Let us next define index sets

Clearly $A_j$ are disjoint and $ \newcommand{\N}{{{\mathbb N}}} \N = \cup_{j=1}^4 A_j$ . We utilise the index sets $A_j$ to rewrite

Equation (30)

where

We continue by studying the terms $ \newcommand{\cI}{\mathcal{I}} \cI_j$ separately. Let $ \newcommand{\gd}{\delta} 0<\gd<1$ be a value, which we later fix. For $ \newcommand{\cI}{\mathcal{I}} \cI_1$ we have by the Hölder inequality

for $\delta_1= \frac{\delta}{1-\delta}$ , where to bound the second parenthesis in the second to last line we have used that $ \newcommand{\e}{{\rm e}} \vert v_\ell-u_\ell\vert \leqslant 2\vert h_\ell\vert $ for any $ \newcommand{\e}{{\rm e}} \ell\in A_1$ . For $ \newcommand{\cI}{\mathcal{I}} \cI_2$ we have

We note that since $ \newcommand{\e}{{\rm e}} \vert v_\ell\vert >\vert h_\ell\vert \geqslant \vert u_\ell\vert $ , the following two inequalities hold:

As a direct consequence we have

for any $0\leqslant \delta\leqslant 1$ . Moreover, since

we obtain

where as before $\delta_{1} = \frac{\delta}{1-\delta}$ . The term $ \newcommand{\cI}{\mathcal{I}} \cI_3$ is very similar to $ \newcommand{\cI}{\mathcal{I}} \cI_2$ (only the role of $ \newcommand{\e}{{\rm e}} u_\ell$ and $ \newcommand{\e}{{\rm e}} v_\ell$ is swapped), and we have

For $ \newcommand{\cI}{\mathcal{I}} \cI_4$ we first note that since $ \newcommand{\e}{{\rm e}} \vert v_\ell\vert >\vert h_\ell\vert $ and $ \newcommand{\e}{{\rm e}} \vert u_\ell\vert >\vert h_\ell\vert $ for $ \newcommand{\e}{{\rm e}} \ell\in A_4$ , if $ \newcommand{\e}{{\rm e}} \vert h_\ell\vert >\frac{\eta}{2\beta_\ell}$ , with $ \newcommand{\e}{{\rm e}} \beta_\ell=\ell^{\frac{t}{d}-\frac{1}{2}},$ we must have $ \newcommand{\sign}{{\rm sign}} \newcommand{\e}{{\rm e}} \sign(u_\ell) =\sign(v_\ell)$ . Now it follows that

where

Combining, since for any $ \newcommand{\gs}{r} \gs>s$ there exists $\delta>0$ small enough so that $ \newcommand{\T}{\mathbb{T}} h\in B^{s+\delta_1(s-t)}_1(\T^d) \cap B^{s+\delta_2}_1(\T^d)$ , we get that as $ \newcommand{\e}{{\rm e}} \eta\to0$ , $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \newcommand{\tr}{\tilde R} \tr_h^\gl (v)-\tr_h^\gl (u)\to0$ . This proves the claim. □

Proof of theorem 3.9. Let $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} A^\gep:=\gep A$ . We first note that by lemma 3.5 we can write

This implies that

Equation (31)

Now consider $ \newcommand{\T}{\mathbb{T}} \{h^{\,j}\}_{j=1}^\infty\subset B_1^{s+1}(\T^d)$ with $h^{\,j}\to h$ in $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ . We have

This, using lemmas 3.6 and 2.3 implies that

and letting $j\to\infty$ in the right-hand side gives

The above inequality together with (31) give the result. □

6.3. Proofs of results in section 4

In some of the proofs below we consider the sequences that converge in the weak*-topology of $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ . We note that the Banach space $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ is isomorphic to a weighted $l^1$ space, and hence its pre-dual is the space of functions $ \newcommand{\T}{\mathbb{T}} \newcommand{\ran}{{\rm Ran}} \newcommand{\p}{\partial} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\e}{{\rm e}} \{v\in B^{-t}_\infty(\T^d): \lim_{j\to\infty}\la v,\psi_\ell\ra=0\}$ .

Proof of lemma 4.1. Suppose $ \newcommand{\T}{\mathbb{T}} \{u_j\}_{j=1}^\infty \subset B^t_1(\T^d)$ is a minimizing sequence of functional $I$ . Clearly, we can assume $\{I(u_j, y)\}_{j=1}^\infty$ , and therefore also $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \norm{u_j}_{B^s_1}$ to be bounded. By the Banach–Alaoglu theorem there exists a subsequence that converges to some $ \newcommand{\T}{\mathbb{T}} \hat u \in B^s_1(\T^d)$ in the weak*-topology. Notice that the norm of $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ is lower semicontinuous in the weak*-topology and consequently $ \newcommand{\T}{\mathbb{T}} \hat u \in B^s_1(\T^d)$ . We now show the strong convergence of the above subsequence in $ \newcommand{\T}{\mathbb{T}} \newcommand{\ts}{{\tilde s}} B^\ts_1(\T^d)$ for any $ \newcommand{\ts}{{\tilde s}} \ts<s$ . We have

Noting that $\Vert \hat u\Vert _{B^s_1}+\Vert u_j\Vert _{B^s_1}\leqslant C < \infty$ , given any $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep>0$ , $N$ can be chosen large enough, independently of $j$ , such that the second term in the last line of the above inequality is bounded by $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep/2$ . Having convergence coefficient-wise, there is $ \newcommand{\N}{{{\mathbb N}}} M\in\N$ large enough so that for $j>M$ the first term is bounded by $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep/2$ as well. We therefore conclude that $u_j\to \hat u$ in $ \newcommand{\T}{\mathbb{T}} \newcommand{\ts}{{\tilde s}} B^\ts_1(\T^d)$ for any $ \newcommand{\ts}{{\tilde s}} \ts<s$ and hence in particular for $ \newcommand{\ts}{{\tilde s}} \ts=t<s-d$ .

By the continuity assumption on Φ, it now follows that

Therefore, $\hat u$ must be a minimizer. □

Proof of theorem 4.3. Assume that $ \newcommand{\T}{\mathbb{T}} u_{\rm min}\in B^s_1(\T^d)$ is a minimizer of $I(\cdot; y)$ . By lemma 3.6 and Lipschitz continuity of $ \newcommand{\G}{\mathcal{G}} \G$ we know that $ \newcommand{\T}{\mathbb{T}} R_h^{\mu^y} \in C(B^t_1(\T^d))$ for any $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} h\in B^\gs_1(\T^d)$ with $ \newcommand{\gs}{r} \gs>s$ . Since $ \newcommand{\T}{\mathbb{T}} u_{\rm min} \in B^s_1(\T^d)$ , we can study $R_h^{\mu^y}$ pointwise and obtain

for any $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} h\in B^\gs_1(\T^d)$ due to the minimizing property of umin. Therefore, umin is a weak MAP.

Consider the reversed claim and assume that $\hat u$ is a weak MAP to the posterior $\mu^y$ in (16). Let us also assume that $ \newcommand{\T}{\mathbb{T}} \hat u \in B^t_1(\T^d)\setminus B^s_1(\T^d)$ . Due to the continuity of Φ and lemma 3.6 we have

Equation (32)

for any $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} h\in B^\gs_1(\T^d)$ , $ \newcommand{\gs}{r} \gs>s$ . Let us construct a particular function $ \newcommand{\T}{\mathbb{T}} \newcommand{\p}{\partial} \newcommand{\gs}{r} \newcommand{\e}{{\rm e}} h^N = \sum_{\ell=1}^\infty h^N_\ell \psi_\ell \in B^\gs_1(\T^d)$ by defining its coefficient vector according to

for some small $ \newcommand{\e}{{\rm e}} \epsilon>0$ . It follows by inequality (32) and continuity of Φ that

Equation (33)

where $C>0$ is the local Lipschitz constant on the neighbourhood of $\hat u$ . However, $N$ was chosen arbitrarily and by our assumption on the smoothness of $\hat u$ the sum on the left hand side of (33) does not stay bounded when $N$ increases. Therefore, inequality (33) leads to a contradiction and we must have $ \newcommand{\T}{\mathbb{T}} \hat u \in B^s_1(\T^d)$ .

Assuming now that the weak MAP $ \newcommand{\T}{\mathbb{T}} \hat u \in B^s_1(\T^d)$ , we can separate the sum in (32) and obtain

for any $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} h\in B^\gs_1(\T^d)$ . By continuity of $I$ and density of $ \newcommand{\T}{\mathbb{T}} \newcommand{\gs}{r} B^\gs_1(\T^d)$ in $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ , we find that $\hat u$ minimizes $I$ . □

Proof of proposition 4.4 relies on the following four lemmas giving some properties of the Besov prior measure we have here. We list these lemmas and their proofs first.

Lemma 6.1. Let $X$ be a separable Banach space, and $B$ an open and convex set in $\mathcal{B}(X)$ . For any non-degenerate measure μ with full support we have $ \newcommand{\p}{\partial} \mu(\partial B)=0$ .

Proof. For any $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep>0$ , there exists a cylindrical set $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} B_\gep$ with $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} B_\gep\supset B$ satisfying $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \mu(B_\gep)-\mu(B)\leqslant \gep$ [5, lemma 2.1.6]. By definition of cylindrical sets, there exists some $ \newcommand{\N}{{{\mathbb N}}} n\in\N$ and $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} B_\gep^0\in\mathcal{B}(\R^n)$ such that

Let $h=(l_1,\dots,l_n)$ . Without loss of generality we can assume that $l_j \in X^*$ , $j=1,...,n$ , are linearly independent and, therefore, $h$ is surjective (see discussion in [5, section 2.1.]). Now we have $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} B\subset h^{-1}(hB)\subset B_{\gep}$ . Notice carefully that $hB$ is an open set, since $h$ is an open map by the open mapping theorem. Hence we obtain

By our assumption on non-degeneracy of μ, we have that $\mu_n$ is absolutely continuous with respect to the Lebesgue measure in $ \newcommand{\R}{{{\mathbb R}}} \R^n$ , and hence we have $ \newcommand{\p}{\partial} \mu_n(\partial (hB))=0$ . Noting that $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \mu(h^{-1}(hB)\setminus B)\leqslant \mu(B_\gep)-\mu(B)\leqslant \gep$ , the result follows. □

In the following $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl$ stands for a centered $B^s_{1}$ -Besov prior on $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ with $t<s-d$ . Also, we frequently consider projections of λ to a subspace $ \newcommand{\T}{\mathbb{T}} \newcommand{\p}{\partial} {\rm span}\{\psi_1,\cdots,\psi_n\} \subset B^t_1(\T^d)$ . We write $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\T}{\mathbb{T}} P_n:B^t_1(\T^d)\to \R^n$ as

Equation (34)

and define $ \newcommand{\la}{\langle} \lambda_n(A) := (\lambda \circ P_n^{-1})(P_n A)$ for any $A\in {{\mathcal B}}(B^t_1)$ .

Lemma 6.2. Let $ \newcommand{\T}{\mathbb{T}} A\subset \mathcal{B}(B^t_1(\T^d))$ be any convex, symmetric and bounded set with diameter $ \newcommand{\norm}[1]{\left\Vert #1 \right\Vert} \delta = \sup_{u,v \in A} \norm{u-v}_{B^t_1}>0$ . For any $ \newcommand{\T}{\mathbb{T}} z\in B^t_1(\T^d)$ , with $t<s-d$ , we have

Proof. First consider the subspace $ \newcommand{\T}{\mathbb{T}} \newcommand{\p}{\partial} {\rm span}\{\psi_1,\cdots,\psi_n\} \subset B^t_1(\T^d)$ and let $ \newcommand{\ga}{\alpha} \newcommand{\e}{{\rm e}} \tilde\ga_\ell=\ell^{\frac{t}{d}-\frac{1}{2}}$ . Now recall $ \newcommand{\ga}{\alpha} \newcommand{\e}{{\rm e}} \ga_\ell \geqslant \tilde\ga_\ell$ since $s>t+d$ . We have

where $ \newcommand{\p}{\partial} \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \newcommand{\pr}{\mu_0} \newcommand{\e}{{\rm e}} \tilde\gl_{n}=\prod_{\ell=1}^n\tilde\rho_\ell$ with $ \newcommand{\ga}{\alpha} \newcommand{\e}{{\rm e}} \tilde\rho_\ell\sim \tilde c_\ell\exp\left((\ga_\ell-\frac{1}{2}\tilde\ga_\ell)\vert u_\ell\vert \right)$ . Since $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \tilde\gl_{n}$ is logarithmically concave, by theorem 6.1 of [8] (see the proof of proposition 2.7 above) we have that $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \tilde\gl_{n}(A+z)\leqslant \tilde\gl_{n}(A)$ .

To consider the limiting case, note that $P_n^{-1}(P_n A)$ is convex and symmetric. Therefore, we can use the first part of the proof above, and write

Equation (35)

with $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep\to 0$ as $n\to\infty$ due to weak convergence of $ \newcommand{\la}{\langle} \lambda_n$ to λ and since $A$ is a continuity set by lemma 6.1. □

Lemma 6.3. Suppose that $ \newcommand{\T}{\mathbb{T}} \bar z\notin B^s_1(\T^d)$ , $ \newcommand{\T}{\mathbb{T}} \{z^\delta\}_{\delta>0}\subset B^t_1(\T^d)$ , $t<s-d$ , and $z^\delta$ converges to $\bar z$ in the weak*-topology of $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ as $\delta\to 0$ . Then for any $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep>0$ there exists δ small enough such that

for $ \newcommand{\gd}{\delta} A^\gd=\gd A$ where $A$ any convex, symmetric and bounded set in $ \newcommand{\T}{\mathbb{T}} \mathcal{B}(B^t_1(\T^d))$ .

Proof. Below we write $ \newcommand{\ran}{{\rm Ran}} \newcommand{\p}{\partial} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\e}{{\rm e}} u_\ell=\la u,\psi_\ell\ra$ for any $ \newcommand{\T}{\mathbb{T}} u\in B^t_1(\T^d)$ and without losing any generality assume that $A$ has diameter $1$ . Since $ \newcommand{\T}{\mathbb{T}} \bar z\notin B^s_1(\T^d)$ , for any $M>0$ there is an $N$ large enough such that

Let $ \newcommand{\gd}{\delta} \gd_0<2MN^{\frac{t-s}{d}}$ . Since $z^\delta$ converges to $\bar z$ in weak*-topology as $ \newcommand{\gd}{\delta} \gd\to 0$ , we have $ \newcommand{\ran}{{\rm Ran}} \newcommand{\p}{\partial} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\e}{{\rm e}} \la\psi_\ell,z^\delta\ra\to \la\psi_\ell,\bar z\ra$ for all $j$ , and therefore $\delta_1<\delta_0$ can be chosen small enough such that

Thus, for any $z\in A^{\delta_1}+z^{\delta_1}$ , we can write

Let $ \newcommand{\gd}{\delta} \gd\leqslant\gd_1$ be sufficiently small so that

For $ \newcommand{\la}{\langle} \newcommand{\gl}{\lambda} \gl_n$ we have that for any $M>0$ there exist $N>0$ and $\delta_1>0$ such that for $n\geqslant N$ and $\delta<\delta_1$ it follows

where in the last line we have used the fact that the density in the integrals of the third line is log concave and $A$ is absolutely convex. Similar inequality as (35) generalizes the result for λ. □

Lemma 6.4. Suppose that $ \newcommand{\T}{\mathbb{T}} \newcommand{\gd}{\delta} \{z^\gd\}_{\gd>0}\subset B^t_1(\T^d)$ converges in weak*-topology and not strongly in $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ to $0$ as $ \newcommand{\gd}{\delta} \gd\to 0$ . Then for any $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep>0$ there exists $ \newcommand{\gd}{\delta} \gd$ small enough such that

for $ \newcommand{\gd}{\delta} A^\gd=\gd A$ where $A$ is any convex, bounded and symmetric set in $ \newcommand{\T}{\mathbb{T}} \mathcal{B}(B^t_1(\T^d))$ .

Proof. Let $ \newcommand{\ga}{\alpha} \newcommand{\e}{{\rm e}} \tilde\ga_\ell=\ell^{\frac{t}{d}-\frac{1}{2}}$ and without loss of generality assume that $A$ has diameter $1$ . For any $ \newcommand{\N}{{{\mathbb N}}} j\in\N$ we have $ \newcommand{\ran}{{\rm Ran}} \newcommand{\p}{\partial} \newcommand{\gd}{\delta} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\e}{{\rm e}} \la\psi_\ell,z^\gd\ra\to 0$ , as $ \newcommand{\gd}{\delta} \gd\to 0$ . There exists a subsequence which we relabel $ \newcommand{\gd}{\delta} \{z^{\gd}\}$ for which elements there exists $\kappa>0$ such that

Equation (36)

Let $M>0$ be arbitrary and choose $N$ large enough so that $ \newcommand{\ga}{\alpha} \newcommand{\e}{{\rm e}} \ga_\ell>M\tilde\ga_\ell$ for any $j>N$ . By the weak* convergence there exists $ \newcommand{\gd}{\delta} \gd$ small enough such that

where $ \newcommand{\ran}{{\rm Ran}} \newcommand{\p}{\partial} \newcommand{\gd}{\delta} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\e}{{\rm e}} z^\gd_\ell=\la\psi_\ell,z^\gd\ra$ . This then, using (36), means that there exists $n>N$ such that

Now one can show that

and

Having these bounds we obtain

where we used the log-concavity of the integrands on the last line. Since κ is fixed, $M$ is arbitrary and δ decreases, the result follows for $ \newcommand{\la}{\langle} \lambda_n$ . In a same way as the previous two lemmas an inequality similar to (35) yields the result. □

Proof of proposition 4.4. 

  • (i)  
    Without losing any generality we assume that the diameter of $A$ is $1$ . The function $ \newcommand{\map}{u_{MAP}} z \mapsto \mu^y(A^\delta + z)$ for fixed $\delta>0$ is bounded from $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ to $[0,1]$ . Let $ \newcommand{\T}{\mathbb{T}} \{z_j\}_{j=1}^\infty\subset B^t_1(\T^d)$ denote the maximizing sequence such that
    Equation (37)
    Suppose the sequence $z_j$ is unbounded. Since $\Phi\geqslant 0$ we have for any $ \newcommand{\T}{\mathbb{T}} z\in B^t_1(\T^d)$ that
    by lemma 6.2. Now as $\Vert z_j\Vert _{B^t_1}\to\infty$ , we have $\mu^y(A^\delta + z_j)\to 0$ which yields a contradiction. Therefore, the sequence $z_j$ must be bounded.Next, by the Banach–Alaoglu theorem there exists a limit $ \newcommand{\T}{\mathbb{T}} w \in B^t_1(\T^d)$ in the weak-* topology. In particular, we have $P_n z_j \to P_n w$ in $ \newcommand{\R}{{{\mathbb R}}} \R^n$ for any $n>0$ . Let us write $\mu^y_n(A) = \mu^y \circ P_n^{-1}(P_n A)$ for any $ \newcommand{\T}{\mathbb{T}} A\in {{\mathcal B}}(B^t_1(\T^d))$ . By weak convergence of $\mu^y_n$ to $\mu^y$ for any $ \newcommand{\e}{{\rm e}} \epsilon>0$ there exists $N>0$ such that for $n>N$ we have
    since $A^\delta$ is a set of continuity for $\mu^y$ by lemma 6.1, i.e. $ \newcommand{\p}{\partial} \mu^y(\partial A^\delta) = 0$ . In consequence, we have
    Equation (38)
    where the convergence to epsilon in (38) appears since $ \newcommand{\la}{\langle} \lambda_n$ is non-degenerate. Since $ \newcommand{\e}{{\rm e}} \epsilon>0$ was arbitrary, we obtain
    and by (37) the supremum must be attained at $ \newcommand{\T}{\mathbb{T}} z^\delta = w \in B^t_1(\T^d)$ .
  • (ii)  
    Since $\Phi(u)\geqslant 0$ and $ \newcommand{\G}{\mathcal{G}} \G$ is locally Lipschitz continuous, for any $ \newcommand{\gd}{\delta} \gd\leqslant 1$ we have
    Equation (39)
    with a constant $L$ independent of $ \newcommand{\gd}{\delta} \gd$ . Suppose that $ \newcommand{\gd}{\delta} \{z^\gd\}$ is not bounded in $X$ . Then by lemma 6.2, for any $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep>0$ , there exists $ \newcommand{\gd}{\delta} \gd$ small enough such that
    This contradicts the first inequality in (39). Hence $\{z^\delta\}$ is bounded in $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ for any $t<s-d$ and there exists $ \newcommand{\T}{\mathbb{T}} \bar z\in B^t_1(\T^d)$ and a subsequence (also denoted by) $ \newcommand{\T}{\mathbb{T}} \{z^\delta\}\subset B^t_1(\T^d)$ such that $z^\delta$ converges to $\bar z$ in the weak*-topology. Now lemmas 6.3 and 6.4 together with (39) imply that $ \newcommand{\T}{\mathbb{T}} \bar z\in B^s_1(\T^d)$ and $z^\delta\to\bar z$ in $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ .
  • (iii)  
    We first note that by local Lipschitz continuity of Φ, there exists a constant $L$ depending on $\Vert \bar z\Vert _{B^t_1}$ such that
    and therefore, since Φ is continuous on $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ and $z^\delta\to \bar z$ in $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ , we have
    Now consider a sequence $ \newcommand{\T}{\mathbb{T}} \{w^{\,j}\}_{j=1}^\infty \subset B^{s+1}_1(\T^d)$ with $w^{\,j}\to \bar z$ in $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ as $j\to\infty$ . Then we have
    This by lemma 3.6 implies that
    and then letting $j\to\infty$ in the right-hand side we get
    Since by definition of $ \newcommand{\gd}{\delta} z^\gd$ we have that $ \newcommand{\gd}{\delta} {\mu^y}(A^\gd+z^\delta)\geqslant {\mu^y}(A^\gd+{\bar z})$ , we get that
    It follows that
    Equation (40)
    and therefore $\bar z$ is a strong MAP estimator.It remains to show that $\bar z$ is a minimizer of $I$ . Let $z^*:=\arg{\min}_{u\in B^s_1} I(u)$ . Suppose that $\bar z$ is not a minimizer so that $I(\bar z)>I(z^*)$ . We first note that by local Lipschitz continuity of Φ, as before we have that there is an $L$ depending on $\Vert \bar z\Vert _{B^t_1}$ and $\Vert z^*\Vert _{B^t_1}$ such that
    Now consider a sequence $ \newcommand{\T}{\mathbb{T}} \{w^{\,j}\}_{j=1}^\infty\subset B^{s+1}_1(\T^d)$ with $w^{\,j}\to \bar z$ in $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ . Then, similar to what we did above, we can show that
    as $j\to\infty$ . We now note that
    which by (40) gives
    This contradicts the fact that by definition of $ \newcommand{\gd}{\delta} z^\gd$ , $ \newcommand{\gd}{\delta} \mu^y(A^\gd+z^\delta)\geqslant \mu^y(A^\gd+z^*)$ .

Proof of corollary 4.5. This follows from the proof of part (i) of proposition 4.4. □

Proof of theorem 4.6. Suppose that $\tilde z$ is a strong MAP estimator. Any strong MAP estimate is a weak MAP estimate and hence, by theorem 4.3, $\tilde z$ is a minimizer of $I$ .

Now let $z^*$ be a minimizer of $I$ . By proposition 4.4 we know that there exists a strong MAP estimate $ \newcommand{\T}{\mathbb{T}} \bar z\in B^s_1(\T^d)$ which also minimises $I$ . Therefore, by proposition 4.2, we have

Let $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} z^\gep=\arg\min_{z\in X}\mu^y(A^\gep+z)$ . By definition 2.1, We can write

The result follows. □

Proof of theorem 4.8. Substituting $y_j$ by $ \newcommand{\G}{\mathcal{G}} \newcommand{\utr}{u^{\dagger}} \G(\utr)+\xi_j$ , we have

Since $u_n$ is a minimizer of $I_n$ , we can write

using Young's inequality. Taking the expectation and using the independence of $\{\xi_j\}$ , we obtain

Equation (41)

and, by application of the Jensen inequality,

Equation (42)

First, by (41), $ \newcommand{\G}{\mathcal{G}} \newcommand{\utr}{u^{\dagger}} \G(u_n)\to \G(\utr)$ in probability as $n\to \infty$ . Therefore there exists a subsequence which satisfies (after labelling by $n$ again)

Equation (43)

Now let $ \newcommand{\p}{\partial} \newcommand{\e}{{\rm e}} \{\psi_\ell\}_{\ell=1}^\infty$ be an $r$ -regular orthonormal wavelet basis for $ \newcommand{\R}{{{\mathbb R}}} L^2(\R^d)$ with $r\geqslant s$ . We then, by (42), have

For $ \newcommand{\e}{{\rm e}} \ell=1$ , the above bound implies the existence of $ \newcommand{\N}{{{\mathbb N}}} \{u_{n_1(k)}\}_{k\in\N}\subset \{u_n\}_{n\in\N}$ and $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\e}{{\rm e}} \eta_1\in\R$ such that $ \newcommand{\ran}{{\rm Ran}} \newcommand{\bbE}{\mathbb{E}} \newcommand{\p}{\partial} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\e}{{\rm e}} \bbE\vert \la u_{n_1(k)},\psi_1\ra\vert \to\eta_1$ as $k\to\infty$ . Considering $ \newcommand{\e}{{\rm e}} \ell=2,3,\dots$ successively one can similarly show the existence of $ \newcommand{\N}{{{\mathbb N}}} \newcommand{\e}{{\rm e}} \{u_{n_1(k)}\}_{k\in\N}\supset \{u_{n_2(k)}\}_{k\in\N}\supset\dots\{u_{n_\ell(k)}\}_{k\in\N}\supset\dots$ and $ \newcommand{\R}{{{\mathbb R}}} \newcommand{\e}{{\rm e}} \{\eta_\ell\}\in\R^\infty$ such that $ \newcommand{\ran}{{\rm Ran}} \newcommand{\bbE}{\mathbb{E}} \newcommand{\p}{\partial} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\e}{{\rm e}} \bbE\vert \la u_{n_\ell(k)},\psi_\ell\ra\vert \to\eta_\ell$ for any $ \newcommand{\N}{{{\mathbb N}}} \newcommand{\e}{{\rm e}} \ell\in\N$ as $k\to\infty$ . The subsequence $ \newcommand{\N}{{{\mathbb N}}} \{u_{n_k(k)}\}_{k\in\N}$ hence satisfies

Equation (44)

We relabel the above subsequence by $ \newcommand{\N}{{{\mathbb N}}} \{u_n\}_{n\in\N}$ . Let $ \newcommand{\p}{\partial} \newcommand{\e}{{\rm e}} u^*:=\sum_{\ell=1}^\infty \eta_\ell\psi_\ell$ . We have, by Jensen's inequality,

Hence $u^*\in B^s_1$ . Let us now consider the strong convergence of the distribution in a larger space $ \newcommand{\ts}{{\tilde s}} B^{\ts}_1$ for $\tilde s<s$ . Take $ \newcommand{\T}{\mathbb{T}} \newcommand{\ts}{{\tilde s}} v\in B^{-\ts}_\infty(\T^d)$ and write

since $ \newcommand{\bbE}{\mathbb{E}} \bbE\Vert u_n\Vert _{B^s_1}$ is uniformly bounded, and as $ \newcommand{\ran}{{\rm Ran}} \newcommand{\bbE}{\mathbb{E}} \newcommand{\p}{\partial} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\ts}{{\tilde s}} \newcommand{\e}{{\rm e}} \sum_{\ell=1}^\infty \ell^{\frac{\ts}{d}-\frac{1}{2}}\bbE\vert \la u_n-u^*,\psi_\ell\ra\vert < \infty$ we can pass the expectation inside [61, theorem 1.38]. Noting that $ \newcommand{\bbE}{\mathbb{E}} \Vert u^*\Vert _{B^s_1}+{\bbE}\Vert u_n\Vert _{B^s_1}\leqslant C < \infty$ , given any $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep>0$ , $N$ can be chosen large enough, independently of $n$ , such that the second term in the last line of the above inequality is bounded by $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep/2$ . Having convergence coefficient-wise, there is $ \newcommand{\N}{{{\mathbb N}}} M\in\N$ large enough so that for $n>M$ the first term is bounded by $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep/2$ as well. We therefore conclude that $u_n$ converges strongly to $u^*$ in $ \newcommand{\T}{\mathbb{T}} \newcommand{\ts}{{\tilde s}} B^\ts_1(\T^d)$ in probability. This then implies ([38, lemma 4.2]) the existence of a subsequence, which we label $\{u_n\}_{n=1}^\infty$ again, such that

This is true in particular for $t<s-d$ . By continuity of $ \newcommand{\G}{\mathcal{G}} \G$ in $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ , we conclude that

This together with (43) give $ \newcommand{\G}{\mathcal{G}} \newcommand{\utr}{u^{\dagger}} \G(\utr)=\G(u^*)$ . □

Proof of corollary 4.9. Since $ \newcommand{\T}{\mathbb{T}} B^s_1(\T^d)$ is dense in $ \newcommand{\T}{\mathbb{T}} B^t_1(\T^d)$ , for any $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep>0$ , there exists $ \newcommand{\T}{\mathbb{T}} v\in B^s_1(\T^d)$ such that $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \newcommand{\utr}{u^{\dagger}} \Vert \utr-v\Vert _X\leqslant\gep$ . As $u_n$ is a minimizer of $I_n$ we have

Rearranging and using Young's inequality, similar to the previous proof, then gives

Noting that $ \newcommand{\utr}{u^{\dagger}} \newcommand{\e}{{\rm e}} \Vert v\Vert _{B^s_1}\leqslant \Vert \utr\Vert _{B^s_1} + \epsilon$ and $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \newcommand{\G}{\mathcal{G}} \newcommand{\utr}{u^{\dagger}} \newcommand{\bi}{\boldsymbol}\vert \Sigma^{-\frac{1}{2}}(\G(\utr)-\G(v))\big\vert \leqslant C\gep$ by local Lipschitz continuity of $ \newcommand{\G}{\mathcal{G}} \G$ , we have

This implies that

Since $ \newcommand{\e}{{\rm e}} \newcommand{\gep}{\epsilon} \gep$ was arbitrary, we conclude that $ \newcommand{\bbE}{\mathbb{E}} \newcommand{\G}{\mathcal{G}} \newcommand{\utr}{u^{\dagger}} \newcommand{\bi}{\boldsymbol}\lim_{n\to\infty}\bbE\vert \Sigma^{-\frac{1}{2}}(\G(\utr)-\G(u_n))\big\vert ^2= 0$ , and therefore, $ \newcommand{\G}{\mathcal{G}} \newcommand{\utr}{u^{\dagger}} \newcommand{\bi}{\boldsymbol}\vert \Sigma^{-\frac{1}{2}}(\G(\utr)-\G(u_n))\big\vert \to 0$ in probability. Hence, there exists a subsequence of $ \newcommand{\G}{\mathcal{G}} \{\G(u_n)\}$ which converges to $ \newcommand{\G}{\mathcal{G}} \newcommand{\utr}{u^{\dagger}} {\G(\utr)}$ almost surely, giving the result. □

6.4. Proofs of results in section 5

Proof of theorem 5.8. The probability distribution $ \newcommand{\p}{\partial} \pi_X$ on $ \newcommand{\R}{{{\mathbb R}}} \R$ has finite Fisher information since

Equation (45)

As pointed out above we have $ \newcommand{\ga}{\alpha} \newcommand{\la}{\langle} \newcommand{\e}{{\rm e}} \lambda_\ell(A) = \lambda_1(\ga_\ell A)$ , for any $ \newcommand{\R}{{{\mathbb R}}} A \in {{\mathcal B}}(\R)$ , where $ \newcommand{\ga}{\alpha} \newcommand{\e}{{\rm e}} \ga_\ell = \ell^{s/d-1/2}$ . By proposition 5.7 we have

and, consequently, $ \newcommand{\T}{\mathbb{T}} \newcommand{\p}{\partial} \newcommand{\la}{\langle} \newcommand{\pr}{\mu_0} D(\lambda) = B^{s^\prime }_{2}(\T^d)$ for $ \newcommand{\p}{\partial} \newcommand{\pr}{\mu_0} s^\prime =s - \frac d2$ . The rest follows from theorem 5.6. □

Acknowledgments

MB acknowledges support from the ERC via Grant EU FP 7—ERC Consolidator Grant 615216 LifeInverse. The work by TH was supported by the Academy of Finland via project 275177. Both MB and TH were further supported by the German Science Exchange Foundation DAAD via Project 57162894, Bayesian Inverse Problems in Banach Space.

Please wait… references are loading.
10.1088/1361-6420/aaacac