Paper The following article is Open access

The renormalization group via statistical inference

and

Published 5 August 2015 © 2015 IOP Publishing Ltd and Deutsche Physikalische Gesellschaft
, , Citation Cédric Bény and Tobias J Osborne 2015 New J. Phys. 17 083005 DOI 10.1088/1367-2630/17/8/083005

1367-2630/17/8/083005

Abstract

In physics, one attempts to infer the rules governing a system given only the results of imperfect measurements. Hence, microscopic theories may be effectively indistinguishable experimentally. We develop an operationally motivated procedure to identify the corresponding equivalence classes of states, and argue that the renormalization group (RG) arises from the inherent ambiguities associated with the classes: one encounters flow parameters as, e.g., a regulator, a scale, or a measure of precision, which specify representatives in a given equivalence class. This provides a unifying framework and reveals the role played by information in renormalization. We validate this idea by showing that it justifies the use of low-momenta n-point functions as statistically relevant observables around a Gaussian hypothesis. These results enable the calculation of distinguishability in quantum field theory. Our methods also provide a way to extend renormalization techniques to effective models which are not based on the usual quantum-field formalism, and elucidates the relationships between various type of RG.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

The renormalization group (RG), as conceived by Wilson [1, 2], relies on the idea that it is possible to describe long-distance physics while essentially ignoring short-distance phenomena; Wilson argued that, if we are content with predictions to some specified accuracy, the effects of physics at smaller lengthscales can be absorbed into the values of a few parameters of some effective theory for the long-distance degrees of freedom. The RG now underpins much of our understanding of modern theoretical physics and provides the interpretational framework for quantum field theories. It has been applied in a dazzling array of incarnations to study systems from statistical physics [3] to applied mathematics [4].

The general applicability of RG techniques strongly suggests the existence of a deep unifying principle which would make it possible to directly compare different manifestations of the RG and to unlock its full potential. It has been suggested that such a general implementation-independent formulation of the RG is to be found in an information-theoretic approach [5] because the RG works by ignoring certain aspects of the system. Although the information-theoretic flavour of the RG is manifest in the case of block-decimation [68], it is far less obvious in the context of particle physics from where the terminology of renormalization originates [9]. Previous attempts at tackling this problem (see, e.g., [1014] for a selection) depend on details of the chosen model or formalism and do not yet offer the truly general unification that one might hope for.

The objective of this paper is to propose an operationally motivated, model-independent, and hence information-theoretic framework for the RG. Our main result is a demonstration that this framework encompasses, as a particular case, the RG implemented with respect to a regulator (as found in QFT).

Our approach is related to that of a recent paper of Machta et al [15], who observed that the relevant parameters selected by the RG have the property that they generate perturbations of a statistical state which are distinguishable (in information-theoretic terms) even when the system is coarse-grained. This is why these parameters can be inferred experimentally and are useful for predictions.

We first step back, and phrase the inference task as a game played between two players: a passive one, Alice, who simply possesses a quantum or classical system, and Bob, who perceives the system via a known noisy quantum channel ${\mathcal{E}}$. (That is, any map linearly taking density matrices to density matrices, even as part of a larger system. Classically, it is any stochastic map.) The channel may for instance represent a coarse-graining. We think of Alice as possessing the true state of a physical system, while Bob is an experimentalist whose practical limitations are formalised by the channel. When Bob tries to infer the state of Alice's system, he is faced with the ill-posed inverse problem of inverting a quantum channel to find the input from the output.

Let us consider first a situation where the channel has a non-trivial kernel. For instance, ${\mathcal{E}}$ could be the partial trace over all high-momentum modes of a theory. If two states ρ and $\rho ^{\prime} $ are such that ${\mathcal{E}}(\rho )={\mathcal{E}}({\rho }^{\prime })$ then they cannot be distinguished by Bob and hence are both just as good as hypotheses for Alice's state. This indistinguishability results in equivalence classes of states: all that Bob can hope to do is to determine in which class the true state is. The classes can be parameterized by a smooth manifold of unique representatives (figure 1(a)). For instance, if ${\mathcal{E}}$ traces out high-momentum modes, the equivalence classes can be labelled by states whose high momentum modes are in some fiducial product state.

Figure 1.

Figure 1. The shaded planes represent equivalence classes of states which cannot be distinguished experimentally. They are intersected by the manifold of effective theories, parametrized in example (a) by a sole parameter α, and, in example (b), additionally by a regularization parameter Λ. The intersection lines are renormalization trajectories $\alpha (\Lambda )$.

Standard image High-resolution image

Once the classes of experimentally indistinguishable states are identified, we propose that the various existing types of RG result from an exploration of the freedom available in choosing the representative within a class. For example, when modifying a regularization parameter, as occurs in high-energy physics, or when simplifying the description of the state and isolating the relevant degrees of freedom, as commonly practised in condensed matter theory. Before we describe these two cases in more depth, we need to consider more general, and more realistic experimental limitations. This requires taking approximate indistinguishability into account.

1. General framework

A reasonable measure of distinguishability between two states ρ and $\rho ^{\prime} $ to be used in this situation is the relative entropy

Equation (1)

which measures the optimal exponential rate of decrease of the probability of mistaking $\rho ^{\prime} $ for ρ as a function of the number of copies available, while still letting the probability of mistaking ρ for $\rho ^{\prime} $ go to zero [16]. This asymmetric scenario is relevant to the situation where one attempts to prove the new hypothesis $\rho ^{\prime} $ against a well established one: ρ. Our framework can also be adapted to different measures, but we use this one here for concreteness. The above interpretation is for an observer able to measure any observable on Alice's system. Bob, however, has a limited access to Alice's state. Since he can only make direct measurements on the states ${\mathcal{E}}(\rho )$ and ${\mathcal{E}}({\rho }^{\prime })$, his optimal ability to distinguish between $\rho ^{\prime} $ and ρ according to the above scenario is instead given by the rate $S({\mathcal{E}}({\rho }^{\prime })\parallel {\mathcal{E}}(\rho ))$.

The effect of ${\mathcal{E}}$ can also be thought of as limiting the type of observable that Bob can measure directly on Alice's system, through the Heisenberg picture defined via the adjointness relation tr $(\rho \;{{\mathcal{E}}}^{\dagger }(A))=\mathrm{tr}({\mathcal{E}}(\rho )A)$: Bob can effectively only measure POVMs on ρ with elements ${{\mathcal{E}}}^{\dagger }({A}_{i})$, where $0\leqslant {A}_{i}\leqslant {\bf{1}}$ and ${\displaystyle \sum }_{i}{A}_{i}={\bf{1}}$. His effective distinguishability rate $S({\mathcal{E}}({\rho }^{\prime })\parallel {\mathcal{E}}(\rho ))$ is hence smaller than that of an all powerful experimentalist, namely $S(\rho ^{\prime} \parallel \rho )$, because he has access to fewer observables.

Consequently, we could attempt to deem two states ρ and $\rho ^{\prime} $ experimentally equivalent if $S({\mathcal{E}}({\rho }^{\prime })\parallel {\mathcal{E}}(\rho ))\lt \delta $ for some desired maximal rate δ. However, this does not define an equivalence relation (this relation is not transitive, nor even symmetric). Nevertheless, if δ is sufficiently small, we still expect that the set of states $\rho ^{\prime} $ close to ρ form an approximately linear subspace of matrices, as occurs in the exact case $\delta =0$. This motivates us to linearize the relation around a starting hypothesis ρ.

Let us consider the state ${\rho }^{\prime }=\rho +\epsilon X$, where epsilon may be arbitrarily small. We will call the operator X, which must be hermitian and traceless, a feature. In terms of the manifold of density matrices, X represents a tangent vector to the point ρ. (It is related to the tangent vector represented as differential operator $\hat{X}$ on scalar functions f $(\rho +\epsilon X)=f(\rho )+\epsilon (\hat{X}f)(\rho )+{\mathcal{O}}({\epsilon }^{2})$). Then, to lowest order in epsilon, we have

Equation (2)

where ${\Omega }_{\rho }^{-1}(Y)=\frac{{\rm{d}}}{{\rm{d}}t}\mathrm{log}(\rho +{tY}){| }_{t=0}$ is a non-commutative version of the operation 'division by ρ'. The quantity $\langle X,Y{\rangle }_{\rho }\equiv \mathrm{tr}(X\;{\Omega }_{\rho }^{-1}(Y))$ is an inner product on operators. Since it is defined at every point of the manifold of states, it is a metric in the sense of differential geometry and is one of the many quantum generalizations of the Fisher information metric [17].

In this linear approximation, a state $\rho \;+\;X$ is approximately indistinguishable from $\rho \;+\;Y$ by Bob if

Equation (3)

The set of states $\rho +X$ satisfying this condition is an ellipsoid within Alice's state space. If ${\mathcal{E}}$ is not invertible, the ellipsoid is infinitely wide in the null directions Z with ${\mathcal{E}}(Z)=0$. Consequently, in the generic case, we use the following idealized relation: we say that the two states $\rho +X$ and $\rho +Y$ are equivalent if $X-Y$ lies in the span of the 'largest' principal directions of the ellipsoid (those that contract 'the most').

This idealization removes any trace of the desired precision δ, as we are only talking of the direction of $X-Y$ independently of its magnitude. Instead, Bob must choose the number n of features he deems sufficiently distinguishable. A pertinent way of doing this is to consider the case where the channel ${\mathcal{E}}$ depends on a parameter σ parameterizing the precision of Bob's instruments, and to worry about the asymptotic behaviour of the norm $\langle {\mathcal{E}}(Z),{\mathcal{E}}(Z){\rangle }_{{\mathcal{E}}(\rho )}$ in the limit of large imprecision σ. The choice of threshold n then amounts to choosing the type of asymptotic behaviour that we deem negligible. This is what happens in the examples presented below.

The principal directions of the ellipsoid are obtained by a singular value decomposition of ${\mathcal{E}}$ with respect to the scalar product defined by the metric. Let ${{\mathcal{R}}}_{\rho }$ be the adjoint of ${\mathcal{E}}$ defined by $\langle X,{{\mathcal{R}}}_{\rho }(Y){\rangle }_{\rho }=\langle {\mathcal{E}}(X),Y{\rangle }_{{\mathcal{E}}(\rho )}$ for all features $X,Y$. Explicitly

Equation (4)

This map generalizes the transpose channel [18]. Classically, if $p(y| x)$ are the components of ${\mathcal{E}}$, then ${{\mathcal{R}}}_{\rho }$ has for components the conditional probabilities $p(x| y)$ derived from Bayes' rule with prior ρ. The principal features Xj are the solution of

Equation (5)

The eigenvalues ${\eta }_{j}$ are also the singular values of ${\mathcal{E}}$, and satisfy $1\geqslant {\eta }_{1}\geqslant {\eta }_{2}\geqslant ...\geqslant 0$. We call ${\eta }_{j}$ the relevance of Xj. The linear operator ${{\mathcal{R}}}_{\rho }{\mathcal{E}}$ is self adjoint in the scalar product $\langle \cdot ,\cdot {\rangle }_{\rho }$. Therefore, the principal features form an orthogonal basis of the tangent space at ρ.

This concept of relevance is a genuinely coordinate independent version of the stiffness defined in [15]. It equals stiffness computed with respect to the special parametrization in which the original metric is given by the identity matrix.

We call a feature X relevant if it is in the span of ${X}_{1},...,{X}_{n}$, and irrelevant if it is orthogonal to those. Our idealized equivalence classes make $\rho \;+\;X$ and $\rho \;+\;Y$ equivalent from the point of view of Bob if and only if $X-Y$ is irrelevant, or, equivalently, if $\langle X-Y,Z{\rangle }_{\rho }=0$ for all relevant feature Z.

In order to obtain a physically more intuitive condition, let us define the principal observables to be the operators ${A}_{j}={\Omega }_{\rho }^{-1}({X}_{j})$, solutions of the dual Heisenberg-picture eigenvalue equation

Equation (6)

Analogously, we say that A is a relevant observable if it is in the span of ${A}_{1},...,{A}_{n}$. With this definition, our equivalence condition amounts to considering two effective states $\rho ^{\prime} $ and $\rho ^{\prime \prime} $ to be equivalent (in the neighbourhood of ρ) when they yield the same expectation values for all relevant observables:

Equation (7)

For instance, consider the strictest possible relevance threshold where only features with exactly zero relevance are deemed to be irrelevant. These are the operators X in the kernel of ${\mathcal{E}}$. In this case we recover the exact state-independent equivalence relation which identifies $\rho ^{\prime} \sim \rho ^{\prime \prime} $ if ${\mathcal{E}}({\rho }^{\prime })={\mathcal{E}}({\rho }^{\prime\prime })$. The corresponding relevant observables are the self-adjoint operators A satisfying $\ \mathrm{tr}({AX})=0$ for all X in the kernel of ${\mathcal{E}}$, which are precisely those of the form ${{\mathcal{E}}}^{\dagger }(B)$ for some B. In addition, these are all the observables that Bob can ever hope to measure expectation values of, since for all B, $\ \mathrm{tr}(B{\mathcal{E}}(\rho ))=\ \mathrm{tr}({{\mathcal{E}}}^{\dagger }(B)\rho )$.

2. One classical mode

For a simple but nontrivial example suppose that Alice has a stochastic classical system consisting of a single real variable, e.g., the position x of a particle. The true state to be discovered by Bob is a probability distribution $\rho (x)$. Bob's experimental limitation consists of a finite precision σ at which he can resolve x. This can be modelled by a stochastic map ${\mathcal{E}}$ whose effect is a convolution of Alice's probability distribution with a Gaussian of width σ:

Equation (8)

here ${N}_{\sigma }$ is the normal distribution with variance σ. Suppose, further, that Bob's initial hypothesis is a simple Gaussian distribution, which we think of as a thermal state $\rho (x)\propto {{\rm{e}}}^{-H(x)}$ for the 'hamiltonian' $H(x)=\frac{{x}^{2}}{2{\tau }^{2}}$. The action of the operator ${{\mathcal{R}}}_{\rho }^{\dagger }$ can be written explicitly:

Equation (9)

Noting also that ${\mathcal{E}}={{\mathcal{E}}}^{\dagger }$, one can check by explicit calculation of the Gaussian integrals that if ${G}_{t}(x)={{\rm{e}}}^{{tx}/\tau -{t}^{2}/2}$, then

Equation (10)

with $\eta =\frac{{\tau }^{2}}{{\sigma }^{2}+{\tau }^{2}}.$ Hence, the eigenvalue problem defined in equation (6) is solved by differentiating equation (10) n times with respect to t, evaluated at t = 0. Observe that Gt(x) is the generating function for the Hermite polynomials, hence the principal observables are the hermite polynomials ${\mathrm{He}}_{n}(x/\tau )$, with respective eigenvalues ${\eta }_{n}={\eta }^{n}$, or ${\eta }_{n}\approx {(\tau /\sigma )}^{2n}$ for $\sigma \gg \tau $. Following our criterion this means that, since the first n hermite polynomials span all polynomials of degree n, that for a threshold n, two nearby states are equivalent exactly when they share the same first n moments.

For instance, up to distinguishability of order ${\sigma }^{-4}$ $(n=2)$, the effective hamiltonian ${H}_{0}(x)=\frac{{x}^{2}}{2{\tau }_{0}^{2}}+\lambda {x}^{4}$ is equivalent to ${H}_{1}(x)=\frac{{x}^{2}}{2{\tau }_{1}^{2}}$ provided that ${\tau }_{1}$ is 'renormalized' so as to yield the same second moment as H0. This simplification from H0 top H1 morally corresponds to a step of the type of RG employed in condensed matter theory, where a hamiltonian is simplified in a way that only affects some 'unobservable' small scale features of the systems.

The situation in particle physics is a priori quite different. Quantum field theories typically come with an unwanted parameter, a regulator Λ, which has no true physical significance, although it often mimics a lattice spacing. Its presence, however, is not a problem if the observable predictions of the theory do not depend on it. This is possible if we assume some reasonable limitation on Bob's measurement abilities, so that any change in Λ can be compensated by a change in the state's other parameters so as to stay within a given equivalence class (figure 1(b)). This dependance of the state's parameters on Λ is the type of RG flow which naturally occurs in QFT.

Using the above toy example, a similar problem could occur for the hamiltonian H0 if λ were to be experimentally determined to be negative (using a first order approximation in λ for the state). Indeed, the resulting distribution $\rho (x)$ would diverge if calculated non-perturbatively. This can be fixed mathematically by adding a sixth order term ${x}^{6}/\Lambda $ to the effective hamiltonian, which, to distinguihsability of order ${\sigma }^{-4}$, can be made to be equivalent to H0 by adjusting the parameters τ and λ as function of Λ so as to preserve up to the fourth moment.

Those two concepts of renormalization can be made to match in QFT because divergences can be identified as contributions from an infinite number of irrelevant features. Hence, the simplification which consists in subtracting them from the state also regularizes the theory.

3. Classical Gaussian states

We solve equation (6) for Gaussian states over arbitrarily many modes, and for a channel ${\mathcal{E}}$ which is any Gaussian stochastic map. We consider n real random variables ${\phi }_{i}$. We write $\phi (f):= {\displaystyle \sum }_{i}{f}_{i}{\phi }_{i}$ for any vector f, which corresponds to a 'smeared' field in the continuum limit. A general Gaussian stochastic map ${\mathcal{E}}$ is defined by the effect of its transpose to the moment-generating functions:

Equation (11)

where $(f,g)={\displaystyle \sum }_{i}{f}_{i}{g}_{i}$, Y and X are real matrices and Y is positive (we give a concrete example in the quantum case). Similarly, an arbitrary (but centred) Gaussian state ρ is defined by

Equation (12)

where A is real and symmetric. Using the definition of ${{\mathcal{R}}}_{\rho }^{\dagger }$ as adjoint of ${{\mathcal{E}}}^{\dagger }$ in the dual metric, applied to the generating functions ${{\rm{e}}}^{\phi (f)}$, one can show that the random variables

Equation (13)

satisfy

Equation (14)

with

Equation (15)

(See appendix A.) Note that H is symmetric with respect to the scalar product $(\cdot ,A\;\cdot )$, hence it has a complete orthonormal family of eigenfunctions fk with eigenvalues ${\eta }_{k}$. We obtain the eigen-variables of ${{\mathcal{E}}}^{\dagger }{{\mathcal{R}}}_{\rho }^{\dagger }$ (namely the principal observables) explicitly by differentiating the generating functional G in the directions of the functions fk any number of times, and evaluating the result at f = 0.

4. Interactions

The previous result can be used to perturbatively calculate the principal observables around nonGaussian states. In order to do this, we need to work within a representation of the real Hilbert space formed by the principal observables of the Gaussian state, together with the scalar product defined by the metric evaluated at the Gaussian state. This is always a symmetric Fock space, where the vacuum $| 0\rangle $ corresponds to the constant random variable $G(0)=1$ (with relevance 1), and the creation operator ${a}_{k}^{\dagger }$ associated with vector fk, acting on a principal observable, leads to a new principal observable with relevance multiplied by ${\eta }_{k}$. The perturbed eigenvalue problem can then be expressed to any order using standard Feynman diagrams. An example is detailed in the appendix B. We show below, however, that the standard RG conditions in QFT can be recovered from the Gaussian results alone.

5. Quantum Gaussian states

A quantum Gaussian state is defined by quantization of a classical phase space. Let f, g denote classical observables which are linear functions in the canonical variables, with some scalar product (f, g) and the symplectic form Δ. Let $\Phi (f)$ denote the quantization of f, such that

Equation (16)

Any quantum state is uniquely specified by its characteristic function $f\mapsto \langle {{\rm{e}}}^{{\rm{i}}\Phi (f)}{\rangle }_{\rho }$. For a quantum Gaussian state ρ, this is ${{\rm{e}}}^{-\displaystyle \frac{1}{2}(f,{Af})}$, where A is a real symmetric matrix satisfying $A+\frac{{\rm{i}}}{2}\Delta \geqslant 0$. A general Gaussian channel is characterized by its effect on the Weyl operators:

Equation (17)

where X and Y are real matrices such that

Equation (18)

One can then verify that the principal observables are polynomials generated by

Equation (19)

This is done by first noting their orthogonality, and applying the definition of ${{\mathcal{R}}}_{\rho }$ with respect to the generating function as in the classical case (see appendix C).

As an example, we consider a gibbs state of a Klein–Gordon field of mass m at inverse temperature β, with canonical conjugate coordinates $\phi (x)$ and $\pi (x)$. We will need the real fourier components

Equation (20)

Equation (21)

which are decoupled under the classical dynamics. Because the phase space is infinite-dimensional, the concept of Gaussian state introduced above has to be generalized with some care. Alternatively, one may choose boundary conditions and a momentum cutoff so as to render it finite-dimensional. For our purpose, we define the Gaussian state through the bilinear form that it defines on the space of linear classical observables. In terms of the observables

Equation (22)

Equation (23)

the quadratic form is

Equation (24)

We also consider a Gaussian channel ${\mathcal{E}}$. In the continuum, the matrices X and Y become linear functions. We use

Equation (25)

where ${N}_{\sigma }\star \cdot $ denotes convolution by a Gaussian of variance σ, and

Equation (26)

The parameter σ characterizes spatial resolution, and ${y}_{\phi }$ and ${y}_{\pi }$ field value resolutions. The condition expressed in equation (18) reduces in this case to the uncertainty relation ${y}_{\phi }{y}_{\pi }\geqslant 1$.

For ${y}_{\phi }{y}_{\pi }\gg 1$, we find that the quantized field observables ${\hat{\phi }}_{k}=\Phi ({\phi }_{k})$ and ${\hat{\pi }}_{k}=\Phi ({\pi }_{k})$ are principal observables, with respective relevance

Equation (27)

and

Equation (28)

Since the channel acts independently on each mode, the products of n copies of such operators with distincts momenta are also eigenvectors with relevance equal to the product of the corresponding values ${\eta }_{k}^{\phi }$ or ${\eta }_{k}^{\pi }$. For instance, the n-point functions ${\hat{\phi }}_{{k}_{1}}\cdots {\hat{\phi }}_{{k}_{n}}$ have relevance ${\eta }_{{k}_{1}}^{\phi }\cdots {\eta }_{{k}_{n}}^{\phi }$, provided that the momenta ${k}_{1},...,{k}_{n}$ are all distinct.

6. Renormalization

If we want to recover a RG, we have to pick a threshold on the asymptotic decay of relevance in terms of the three noise parameters σ, ${y}_{\pi }$ and ${y}_{\phi }$. Since the relevance decays exponentially in the total momentum, the product of field operators can always be considered irrelevant as soon as they involve operators with mode $k\gg 1/\sigma $. At temperatures large compared to m, the relevance of the n-point functions also decays to order $2n$ in ${y}_{\pi }$ and ${y}_{\phi }$. Hence, in this approximation two states are effectively equivalent if they have the same n-point functions at momenta smaller than $1/\sigma $. Without restriction on n, this is precisely the conditions used in QFT for the RG as function of a regulator Λ (needed as momentum cutoff on divergent integrals resulting from perturbation theory around Gaussians). Instead of viewing the cutoff as an explicit parameter of the state, a change of cutoff from Λ to $\Lambda ^{\prime} $ can also be absorbed into a rescaling of space by a factor $s={\Lambda }^{\prime }/\Lambda $. The condition that the state stays in the same equivalence class independantly of s yields the Callan–Symanzik equations.

From the condensed matter point of view, a cutoff Λ is fixed and given by the lattice spacing. The description of the state can be simplified by exploiting the freedom we have in choosing a representative of an equivalence class. We may chose the one closest to ρ in relative entropy: this optimization is well known and yields the gibbs states with only relevant observables in the hamiltonian perturbations, namely terms with field modes $| k| \lt 1/\sigma $. The requirement that visible predictions be invariant from σ leads to an RG. This matches the previous RG in the sense that the procedure is technically equivalent to lowering the cutoff Λ to $1/\sigma $ in perturbative expansions.

But our calculation also tells us that we may neglect features whose distinguishability scales poorly with the field-value precisions ${y}_{\phi }$ and ${y}_{\pi }$, hence justifying the use of effective hamiltonians with low degree polynomials in the fields. For instance, choosing n = 2 selects only quadratic terms as relevant observables, and the Gaussians states as a natural family of effective states (this is a very different argument than the one based on renormalizibility). We note that here ${y}_{\phi }$, ${y}_{\pi }$ play very different roles than σ because of the differences in asymptotic relevance behaviour, but we could imagine a different type of experimental limitations where more parameters govern the RG besides σ.

7. Distinguishability in QFT

We can use the solution of equation (6) to compute the effective distinguishability D(A) of any perturbation generated by a Hamiltonian term A, defined as the lowest-order approximation of the relative entropy $S({\mathcal{E}}({\rho }_{\epsilon })\parallel {\mathcal{E}}({\rho }_{0}))={\epsilon }^{2}D(A)+{\mathcal{O}}({\epsilon }^{3})$, where ${\rho }_{\epsilon }\propto {{\rm{e}}}^{-H+\epsilon A}$ are normalized states. Indeed, we have $D(A)=\langle A,{{\mathcal{E}}}^{\dagger }{{\mathcal{R}}}_{\rho }^{\dagger }(A){\rangle }_{\rho }$, which can be computed by expressing A in the basis of principal observables around ρ. For instance, in the scalar field example, $D({\hat{\phi }}_{k})={\eta }_{k}^{\phi }\langle {\hat{\phi }}_{k},{\hat{\phi }}_{k}{\rangle }_{\rho }$, where ${\eta }_{k}^{\phi }$ is given above, and $\langle {\hat{\phi }}_{k},{\hat{\phi }}_{k}{\rangle }_{\rho }=1/\beta {\omega }_{k}^{2}$. With the standard tools of perturbative QFT, this can be generalized to higher-order expansions of the exponential (while keeping with the lowest order approximation of the relative entropy).

For non-local terms, D has to be made into a density. It may be argued that the unit of volume used to define the density ought to explicitely scale with σ, leading to the distinguishability density ${d}_{\sigma }(A)={\sigma }^{d}{\mathrm{lim}}_{\Sigma }{D}_{\sigma }({A}_{\Sigma })/| \Sigma | $, where d is the dimension of space and ${A}_{\Sigma }$ a restriction of A to a region Σ of volume $| \Sigma | $. A Hamiltonian term may then be said to be relevant in information-theoretic terms if ${d}_{\sigma }$ scales as a positive power of σ (for a fixed state). Preliminary calculations indicate that the result is compatible with the Wilsonian analysis classically, but may differ in important ways in the genuinely quantum analysis. This will be analysed in further work.

8. Concluding remarks

We introduced a framework which allows for the definition of effective theories in very general terms, taking into account any measure of distinguishability and any model of experimental limitations. We demonstrated the pertinence of this approach by showing that it naturally contains, as a particular case, the concept of effective theory as defined by the RG of quantum field theory. Further work will explore how varying the assumptions lead to effective theories which differ from the standard QFT framework. For instance, taking the field-value resolutions into account in the interacting context leads to a concept of dressed effective field which depends in principle on the detail of the coarse-graining channel and distinguishability metric.

Most interestingly, the fact that this approach is not at all tied to the standard QFT formalism means that it can in principle be applied to completely different types of theories, as well as very different models of experimental limitation (not necessarily related to scale). For instance, the case of loop quantum gravity [19], which proposes a class of background-free quantum field theories, could provide interesting applications.

This approach can also be naturally applied to spin lattice systems, so as to derive effective field theories describing their large scale properties. In standard approaches, a spins system is connected to an effective QFT through symmetry arguments (observables in the discrete and continuous descriptions are paired by identifying the group transformations they generate). Our approach provides a more bottom-up approach, where the effective QFT can in principle be derived through the mechanism by which it emerges, i.e., through the identification of the degrees of freedom which are effectively ignored. This can be performed numerically by solving equation (6) using techniques such as matrix product states.

Finally, the framing of effective QFT in this fundamentally information-theoretical approach elucidates precisely what information is being destroyed when the theory is renormalized. A concrete way of quantifying this is proposed in the last section. This may provide a first step towards generalizing Zamolodchikov's ctheorem [9, 20], which could in turn provide new techniques for the general classification of effective QFTs.

Acknowledgments

Helpful discussions with numerous people are most gratefully acknowledged: a partial list includes John DeBrota, Andrew Doherty, Jens Eisert, Steve Flammia, Jutho Haegeman, Gerard Milburn, Terry Rudolph, Tom Stace, Frank Verstraete, and Reinhard Werner. This work was supported by the ERC grants QFTCMPS and SIQS and by the cluster of excellence EXC 201 Quantum Engineering and Space-Time Research. We acknowledge support by Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of Leibniz Universität Hannover.

Appendix A.: Classical Gaussian states

We want to find the action of ${{\mathcal{R}}}_{{\rho }^{\prime }}^{\dagger }$ on G(f). By directly using its definition as adjoint of ${{\mathcal{E}}}^{\dagger }$, we find that for any linear classical observables f and g

But note that, using

Equation (29)

we have

By comparing the two expressions, we obtain

Since this is true for all g, then

or

Equation (30)

and hence

Equation (31)

Note that

Equation (32)

Hence, defining $H={(1+{A}^{-1}{X}^{-1}{{YX}}^{-1})}^{-1}$, we conclude that

Equation (33)

Observe that ${AH}={H}^{T}A$. Hence H is symmetric in the scalar product $(\cdot ,A\cdot )$. Let fk be am orthonormal basis of eigenfunction of H:

Equation (34)

(Note that it may be convenient to consider a complex eigenbasis. Hence we complexify this real Hilbert space in the obvious way.)

We obtain the eigenfunctions of ${{\mathcal{E}}}^{\dagger }{{\mathcal{R}}}_{\rho }^{\dagger }$ by differentiating G(f) with respect to the basis functions fk. Indeed, let ${\delta }_{k}$ denote the functional derivative in the direction of fk, i.e., for any functional Z(f),

Equation (35)

then we obtain

Equation (36)

We have

where the primed derivatives are relative to f'.

The functions $({\delta }_{{k}_{1}}\cdots {\delta }_{{k}_{n}}G)(0)$ form an orthogonal basis of a representation of the symmetric Fock space ${\mathcal{F}}$ built from the test functions, with scalar product $\langle \cdot ,\cdot {\rangle }_{\rho }$. One can think of $G(0)=1\equiv | 0\rangle $ as the vacuum. The other eigenfunctions are obtain by acting on it with the creation operators ${a}_{k}^{\dagger }$ for the 'mode' fk. The commutation relations are

Equation (37)

Explicitly

Equation (38)

Appendix B.: Perturbation theory

This pictures allows one to find the principal observables around non-Gaussian states by using perturbation theory. The trick is to express the information metric with respect to the perturbed state through its kernel K expressed in that Fock space. This allows one to write also the map ${{\mathcal{R}}}_{{\rho }^{\prime }}^{\dagger }$, for the perturbed state ${\rho }^{\prime }$, also perturbatively as an operator in Fock space. The eigenvalue problem can then be formulated and computed to any degree using standard Feynman diagram techniques.

Let us defined the generating functions

Equation (39)

Differentiating this functional yields the components in the Fock basis of an operator K that is the kernel of the metric defined by $\rho ^{\prime} $ with respect to the unperturbed metric ρ:

where ${\delta }_{k}^{\prime }$ denotes derivation with respect to f' in the direction fk. Similarly,

Equation (40)

is the generating function of the identity operator.

It will be convenient also to consider the Fock space defined from the metric at point ${\mathcal{E}}(\rho )$, with the same vacuum; but with creation operators ${b}_{k}^{\dagger }$ defined by

Equation (41)

where

Equation (42)

These are indeed orthogonal since

Equation (43)

where we used the fact that H is positive in terms of the scalar product $(\cdot ,A\cdot )$.

In this basis we express

Equation (44)

We can compute from our previous results that

Equation (45)

Hence we find that K and J are related by

Equation (46)

Also, the channel ${{\mathcal{E}}}^{\dagger }$ naturally maps between the two Fock spaces, represented as the operator E generated by

Finally, the unknown is the operator R representing ${{\mathcal{R}}}_{{\rho }^{\prime }}^{\dagger }$ as

Equation (47)

It is defined by the relation

Equation (48)

for all f and g. Expanding this relation in the respective Fock basis, we obtain

Equation (49)

or

Equation (50)

If we expand

Equation (51)

we have

Equation (52)

and

Equation (53)

Expanding ER, we obtain

In order to compute the first order corrections to the unperturbed eigenvalue problem, we need the generating function of the perturbation

Equation (54)

Since E is diagonal in the Fock basis, we only need to worry about K1 and L1 directly.

The generating function ${K}_{1}(f,f^{\prime} )$ of the operator K1 is

Equation (55)

We will consider an interaction of the form

Equation (56)

where the functions fx possibly form a different basis than fk. This generates the state ${\rho }^{\prime }=\rho ({\bf{1}}+\lambda {X}_{1}+...)$ where

Equation (57)

We have

Equation (58)

where

When differentiated, this free partition function yields Feynman diagrams with no propagation between the vertices associated with f or f' respectively.

As an example, we performed this calculation for the state corresponding to the euclidean form of the Klein–Gordon scalar field theory with ${\phi }^{4}$ interaction (in an arbitray number of spatial dimensions). The channel defined an operator X which performs a Gaussian convolution over scale σ as in the quantum example in the article. We also use $Y={y}^{2}1$.

The free theory yields the Gaussian state defined by the operator A, inverse of $({A}^{-1}f)(x)=\beta [{\displaystyle \sum }_{i}{\partial }_{i}^{2}+{m}^{2}]f(x)$. Given that it commutes with the operator X defined by

Equation (59)

with d the number of dimensions, H is codiagonal with X and H which are all self-adjoint in the ${L}^{2}({\mathbb{R}})$ scalar product.

Using the eigenfunctions of A:

Equation (60)

with

Equation (61)

we obtain the eigenvalues of H:

Equation (62)

The normalized unperturbed 'one-particle' principal observables are ${a}_{k}^{\dagger }| 0\rangle \equiv \phi ({f}_{k})$. Note that we used complex eigenfunctions because it makes the calculations much simpler. The degeneracy between the k and $-k$ eigenfunctions allows one to recover the real eigenfunctions by linear composition of the complex ones.

The interaction is defined as above using the improper basis

Equation (63)

Hence, besides ${A}_{{kl}}=\delta (k-l)$, we find

Equation (64)

and

Equation (65)

The principal observables around $\rho ^{\prime} $, obtained by perturbation from the one-particle observables for the Gaussian state ρ, are $| \psi {\rangle }_{k}={a}_{k}^{\dagger }| 0\rangle +\lambda | {\psi }_{k}^{1}\rangle +{\mathcal{O}}({\lambda }^{2})$ where the only non-zero components of the first order correction $| {\psi }_{k}^{1}\rangle $ are

where we omitted the next two terms which are obtained by rotating l1, l2 and l3.

Appendix C.: Quantum Gaussian states

We use the notation introduced in the paper. In order to compute the metric explicitly, we need the commutation relation

Equation (66)

where we are now working in a complexified phase-space so as to accommodate imaginary time evolutions. Indeed, we need the group of complex matrices $s\mapsto {R}_{s}^{A}$ associated with the Gaussian state $\rho $ such that

Equation (67)

The metric (in the Heisenberg picture) is $\langle A,B{\rangle }_{\rho }={\displaystyle \int }_{0}^{1}\langle A{\rho }^{s}B{\rho }^{-s}{\rangle }_{\rho }$. This group is symplectic: ${R}_{s}^{T}\Delta {R}_{s}=\Delta $ and leaves the state $\rho $ invariant: ${R}_{s}^{T}{{AR}}_{s}=\Delta $.

Using these properties, we find that the polynomials generated by ${G}_{A}(f)={{\rm{e}}}^{\frac{1}{2}(\bar{f},{Af})+{\rm{i}}\Phi (f)}$ are orthogonal with respect to the metric when they are of different degrees in the canonical observables. This follows from the fact that $\langle {G}_{A}(f),{G}_{A}(g){\rangle }_{\rho }={\displaystyle \int }_{0}^{1}{{\rm{e}}}^{(f,{K}_{s}g)}{\rm{d}}s$ where ${K}_{s}={R}_{\frac{s}{2}}^{\dagger }(A+\displaystyle \frac{{\rm{i}}}{2}\Delta ){R}_{\frac{s}{2}}$. Indeed, the derivatives of the integrand evaluates to zero at $f=g=0$ whenever the number of differentiations with respect to f is not equal to the number of differentiations with respect to g.

Moreover, since the channel maps $\rho $ to a Gaussian state defined by the new matrix $B={X}^{T}{AX}+Y$, we find that ${{\mathcal{E}}}^{\dagger }({G}_{B}(f))={G}_{A}({Xf})$. Finally, using the definition of ${{\mathcal{R}}}_{\rho }^{\dagger }$ as adjoint of ${{\mathcal{E}}}^{\dagger }$, we obtain

Equation (68)

This imply that a polynomial of order n generated by GA is mapped by ${{\mathcal{R}}}_{\rho }^{\dagger }$ to a polynomial of order n generated by GB, which is then mapped back to a polynomial of order n generated by GA.

Therefore, we conclude that the principal observables are polynomials generated by GA. Finding the exact polynomials of a given degree can be done for each order independently, which is a finite-dimensional problem.

This statement is in fact true for all quantum generalizations of the Fisher information metric. Classically, the Fisher information metric is characterized as the only metric on the manifold of probability distributions which contracts under the action of any stochastic map. In the quantum case, Petz and Sudár [21] characterized all contractive metrics. They are defined by an operator monotone function $\theta \;:{{\mathbb{R}}}^{+}\to {{\mathbb{R}}}^{+}$ such that $\theta (t)=t\theta ({t}^{-1})$ for all $t\gt 0$. An operator monotone function has the property that, when applied to operators via functional calculus, $\theta (A)\leqslant \theta (B)$ whenever $A\lt B$ (i.e., $B-A$ is positive). The function θ defines the kernel ${\Omega }_{\rho }^{-1}$ via its inverse ${\Omega }_{\rho }$ as follows:

Equation (69)

where ${R}_{\rho }(A):= A\rho $ and ${L}_{\rho }(A):= \rho A$ for any matrix A. It is straightforward to adapt the above argument to this general form.

Please wait… references are loading.
10.1088/1367-2630/17/8/083005