Abstract
In physics, one attempts to infer the rules governing a system given only the results of imperfect measurements. Hence, microscopic theories may be effectively indistinguishable experimentally. We develop an operationally motivated procedure to identify the corresponding equivalence classes of states, and argue that the renormalization group (RG) arises from the inherent ambiguities associated with the classes: one encounters flow parameters as, e.g., a regulator, a scale, or a measure of precision, which specify representatives in a given equivalence class. This provides a unifying framework and reveals the role played by information in renormalization. We validate this idea by showing that it justifies the use of low-momenta n-point functions as statistically relevant observables around a Gaussian hypothesis. These results enable the calculation of distinguishability in quantum field theory. Our methods also provide a way to extend renormalization techniques to effective models which are not based on the usual quantum-field formalism, and elucidates the relationships between various type of RG.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
The renormalization group (RG), as conceived by Wilson [1, 2], relies on the idea that it is possible to describe long-distance physics while essentially ignoring short-distance phenomena; Wilson argued that, if we are content with predictions to some specified accuracy, the effects of physics at smaller lengthscales can be absorbed into the values of a few parameters of some effective theory for the long-distance degrees of freedom. The RG now underpins much of our understanding of modern theoretical physics and provides the interpretational framework for quantum field theories. It has been applied in a dazzling array of incarnations to study systems from statistical physics [3] to applied mathematics [4].
The general applicability of RG techniques strongly suggests the existence of a deep unifying principle which would make it possible to directly compare different manifestations of the RG and to unlock its full potential. It has been suggested that such a general implementation-independent formulation of the RG is to be found in an information-theoretic approach [5] because the RG works by ignoring certain aspects of the system. Although the information-theoretic flavour of the RG is manifest in the case of block-decimation [6–8], it is far less obvious in the context of particle physics from where the terminology of renormalization originates [9]. Previous attempts at tackling this problem (see, e.g., [10–14] for a selection) depend on details of the chosen model or formalism and do not yet offer the truly general unification that one might hope for.
The objective of this paper is to propose an operationally motivated, model-independent, and hence information-theoretic framework for the RG. Our main result is a demonstration that this framework encompasses, as a particular case, the RG implemented with respect to a regulator (as found in QFT).
Our approach is related to that of a recent paper of Machta et al [15], who observed that the relevant parameters selected by the RG have the property that they generate perturbations of a statistical state which are distinguishable (in information-theoretic terms) even when the system is coarse-grained. This is why these parameters can be inferred experimentally and are useful for predictions.
We first step back, and phrase the inference task as a game played between two players: a passive one, Alice, who simply possesses a quantum or classical system, and Bob, who perceives the system via a known noisy quantum channel . (That is, any map linearly taking density matrices to density matrices, even as part of a larger system. Classically, it is any stochastic map.) The channel may for instance represent a coarse-graining. We think of Alice as possessing the true state of a physical system, while Bob is an experimentalist whose practical limitations are formalised by the channel. When Bob tries to infer the state of Alice's system, he is faced with the ill-posed inverse problem of inverting a quantum channel to find the input from the output.
Let us consider first a situation where the channel has a non-trivial kernel. For instance, could be the partial trace over all high-momentum modes of a theory. If two states ρ and are such that then they cannot be distinguished by Bob and hence are both just as good as hypotheses for Alice's state. This indistinguishability results in equivalence classes of states: all that Bob can hope to do is to determine in which class the true state is. The classes can be parameterized by a smooth manifold of unique representatives (figure 1(a)). For instance, if traces out high-momentum modes, the equivalence classes can be labelled by states whose high momentum modes are in some fiducial product state.
Once the classes of experimentally indistinguishable states are identified, we propose that the various existing types of RG result from an exploration of the freedom available in choosing the representative within a class. For example, when modifying a regularization parameter, as occurs in high-energy physics, or when simplifying the description of the state and isolating the relevant degrees of freedom, as commonly practised in condensed matter theory. Before we describe these two cases in more depth, we need to consider more general, and more realistic experimental limitations. This requires taking approximate indistinguishability into account.
1. General framework
A reasonable measure of distinguishability between two states ρ and to be used in this situation is the relative entropy
which measures the optimal exponential rate of decrease of the probability of mistaking for ρ as a function of the number of copies available, while still letting the probability of mistaking ρ for go to zero [16]. This asymmetric scenario is relevant to the situation where one attempts to prove the new hypothesis against a well established one: ρ. Our framework can also be adapted to different measures, but we use this one here for concreteness. The above interpretation is for an observer able to measure any observable on Alice's system. Bob, however, has a limited access to Alice's state. Since he can only make direct measurements on the states and , his optimal ability to distinguish between and ρ according to the above scenario is instead given by the rate .
The effect of can also be thought of as limiting the type of observable that Bob can measure directly on Alice's system, through the Heisenberg picture defined via the adjointness relation tr : Bob can effectively only measure POVMs on ρ with elements , where and . His effective distinguishability rate is hence smaller than that of an all powerful experimentalist, namely , because he has access to fewer observables.
Consequently, we could attempt to deem two states ρ and experimentally equivalent if for some desired maximal rate δ. However, this does not define an equivalence relation (this relation is not transitive, nor even symmetric). Nevertheless, if δ is sufficiently small, we still expect that the set of states close to ρ form an approximately linear subspace of matrices, as occurs in the exact case . This motivates us to linearize the relation around a starting hypothesis ρ.
Let us consider the state , where may be arbitrarily small. We will call the operator X, which must be hermitian and traceless, a feature. In terms of the manifold of density matrices, X represents a tangent vector to the point ρ. (It is related to the tangent vector represented as differential operator on scalar functions f ). Then, to lowest order in , we have
where is a non-commutative version of the operation 'division by ρ'. The quantity is an inner product on operators. Since it is defined at every point of the manifold of states, it is a metric in the sense of differential geometry and is one of the many quantum generalizations of the Fisher information metric [17].
In this linear approximation, a state is approximately indistinguishable from by Bob if
The set of states satisfying this condition is an ellipsoid within Alice's state space. If is not invertible, the ellipsoid is infinitely wide in the null directions Z with . Consequently, in the generic case, we use the following idealized relation: we say that the two states and are equivalent if lies in the span of the 'largest' principal directions of the ellipsoid (those that contract 'the most').
This idealization removes any trace of the desired precision δ, as we are only talking of the direction of independently of its magnitude. Instead, Bob must choose the number n of features he deems sufficiently distinguishable. A pertinent way of doing this is to consider the case where the channel depends on a parameter σ parameterizing the precision of Bob's instruments, and to worry about the asymptotic behaviour of the norm in the limit of large imprecision σ. The choice of threshold n then amounts to choosing the type of asymptotic behaviour that we deem negligible. This is what happens in the examples presented below.
The principal directions of the ellipsoid are obtained by a singular value decomposition of with respect to the scalar product defined by the metric. Let be the adjoint of defined by for all features . Explicitly
This map generalizes the transpose channel [18]. Classically, if are the components of , then has for components the conditional probabilities derived from Bayes' rule with prior ρ. The principal features Xj are the solution of
The eigenvalues are also the singular values of , and satisfy . We call the relevance of Xj. The linear operator is self adjoint in the scalar product . Therefore, the principal features form an orthogonal basis of the tangent space at ρ.
This concept of relevance is a genuinely coordinate independent version of the stiffness defined in [15]. It equals stiffness computed with respect to the special parametrization in which the original metric is given by the identity matrix.
We call a feature X relevant if it is in the span of , and irrelevant if it is orthogonal to those. Our idealized equivalence classes make and equivalent from the point of view of Bob if and only if is irrelevant, or, equivalently, if for all relevant feature Z.
In order to obtain a physically more intuitive condition, let us define the principal observables to be the operators , solutions of the dual Heisenberg-picture eigenvalue equation
Analogously, we say that A is a relevant observable if it is in the span of . With this definition, our equivalence condition amounts to considering two effective states and to be equivalent (in the neighbourhood of ρ) when they yield the same expectation values for all relevant observables:
For instance, consider the strictest possible relevance threshold where only features with exactly zero relevance are deemed to be irrelevant. These are the operators X in the kernel of . In this case we recover the exact state-independent equivalence relation which identifies if . The corresponding relevant observables are the self-adjoint operators A satisfying for all X in the kernel of , which are precisely those of the form for some B. In addition, these are all the observables that Bob can ever hope to measure expectation values of, since for all B, .
2. One classical mode
For a simple but nontrivial example suppose that Alice has a stochastic classical system consisting of a single real variable, e.g., the position x of a particle. The true state to be discovered by Bob is a probability distribution . Bob's experimental limitation consists of a finite precision σ at which he can resolve x. This can be modelled by a stochastic map whose effect is a convolution of Alice's probability distribution with a Gaussian of width σ:
here is the normal distribution with variance σ. Suppose, further, that Bob's initial hypothesis is a simple Gaussian distribution, which we think of as a thermal state for the 'hamiltonian' . The action of the operator can be written explicitly:
Noting also that , one can check by explicit calculation of the Gaussian integrals that if , then
with Hence, the eigenvalue problem defined in equation (6) is solved by differentiating equation (10) n times with respect to t, evaluated at t = 0. Observe that Gt(x) is the generating function for the Hermite polynomials, hence the principal observables are the hermite polynomials , with respective eigenvalues , or for . Following our criterion this means that, since the first n hermite polynomials span all polynomials of degree n, that for a threshold n, two nearby states are equivalent exactly when they share the same first n moments.
For instance, up to distinguishability of order , the effective hamiltonian is equivalent to provided that is 'renormalized' so as to yield the same second moment as H0. This simplification from H0 top H1 morally corresponds to a step of the type of RG employed in condensed matter theory, where a hamiltonian is simplified in a way that only affects some 'unobservable' small scale features of the systems.
The situation in particle physics is a priori quite different. Quantum field theories typically come with an unwanted parameter, a regulator Λ, which has no true physical significance, although it often mimics a lattice spacing. Its presence, however, is not a problem if the observable predictions of the theory do not depend on it. This is possible if we assume some reasonable limitation on Bob's measurement abilities, so that any change in Λ can be compensated by a change in the state's other parameters so as to stay within a given equivalence class (figure 1(b)). This dependance of the state's parameters on Λ is the type of RG flow which naturally occurs in QFT.
Using the above toy example, a similar problem could occur for the hamiltonian H0 if λ were to be experimentally determined to be negative (using a first order approximation in λ for the state). Indeed, the resulting distribution would diverge if calculated non-perturbatively. This can be fixed mathematically by adding a sixth order term to the effective hamiltonian, which, to distinguihsability of order , can be made to be equivalent to H0 by adjusting the parameters τ and λ as function of Λ so as to preserve up to the fourth moment.
Those two concepts of renormalization can be made to match in QFT because divergences can be identified as contributions from an infinite number of irrelevant features. Hence, the simplification which consists in subtracting them from the state also regularizes the theory.
3. Classical Gaussian states
We solve equation (6) for Gaussian states over arbitrarily many modes, and for a channel which is any Gaussian stochastic map. We consider n real random variables . We write for any vector f, which corresponds to a 'smeared' field in the continuum limit. A general Gaussian stochastic map is defined by the effect of its transpose to the moment-generating functions:
where , Y and X are real matrices and Y is positive (we give a concrete example in the quantum case). Similarly, an arbitrary (but centred) Gaussian state ρ is defined by
where A is real and symmetric. Using the definition of as adjoint of in the dual metric, applied to the generating functions , one can show that the random variables
satisfy
with
(See appendix
4. Interactions
The previous result can be used to perturbatively calculate the principal observables around nonGaussian states. In order to do this, we need to work within a representation of the real Hilbert space formed by the principal observables of the Gaussian state, together with the scalar product defined by the metric evaluated at the Gaussian state. This is always a symmetric Fock space, where the vacuum corresponds to the constant random variable (with relevance 1), and the creation operator associated with vector fk, acting on a principal observable, leads to a new principal observable with relevance multiplied by . The perturbed eigenvalue problem can then be expressed to any order using standard Feynman diagrams. An example is detailed in the appendix
5. Quantum Gaussian states
A quantum Gaussian state is defined by quantization of a classical phase space. Let f, g denote classical observables which are linear functions in the canonical variables, with some scalar product (f, g) and the symplectic form Δ. Let denote the quantization of f, such that
Any quantum state is uniquely specified by its characteristic function . For a quantum Gaussian state ρ, this is , where A is a real symmetric matrix satisfying . A general Gaussian channel is characterized by its effect on the Weyl operators:
where X and Y are real matrices such that
One can then verify that the principal observables are polynomials generated by
This is done by first noting their orthogonality, and applying the definition of with respect to the generating function as in the classical case (see appendix
As an example, we consider a gibbs state of a Klein–Gordon field of mass m at inverse temperature β, with canonical conjugate coordinates and . We will need the real fourier components
which are decoupled under the classical dynamics. Because the phase space is infinite-dimensional, the concept of Gaussian state introduced above has to be generalized with some care. Alternatively, one may choose boundary conditions and a momentum cutoff so as to render it finite-dimensional. For our purpose, we define the Gaussian state through the bilinear form that it defines on the space of linear classical observables. In terms of the observables
the quadratic form is
We also consider a Gaussian channel . In the continuum, the matrices X and Y become linear functions. We use
where denotes convolution by a Gaussian of variance σ, and
The parameter σ characterizes spatial resolution, and and field value resolutions. The condition expressed in equation (18) reduces in this case to the uncertainty relation .
For , we find that the quantized field observables and are principal observables, with respective relevance
and
Since the channel acts independently on each mode, the products of n copies of such operators with distincts momenta are also eigenvectors with relevance equal to the product of the corresponding values or . For instance, the n-point functions have relevance , provided that the momenta are all distinct.
6. Renormalization
If we want to recover a RG, we have to pick a threshold on the asymptotic decay of relevance in terms of the three noise parameters σ, and . Since the relevance decays exponentially in the total momentum, the product of field operators can always be considered irrelevant as soon as they involve operators with mode . At temperatures large compared to m, the relevance of the n-point functions also decays to order in and . Hence, in this approximation two states are effectively equivalent if they have the same n-point functions at momenta smaller than . Without restriction on n, this is precisely the conditions used in QFT for the RG as function of a regulator Λ (needed as momentum cutoff on divergent integrals resulting from perturbation theory around Gaussians). Instead of viewing the cutoff as an explicit parameter of the state, a change of cutoff from Λ to can also be absorbed into a rescaling of space by a factor . The condition that the state stays in the same equivalence class independantly of s yields the Callan–Symanzik equations.
From the condensed matter point of view, a cutoff Λ is fixed and given by the lattice spacing. The description of the state can be simplified by exploiting the freedom we have in choosing a representative of an equivalence class. We may chose the one closest to ρ in relative entropy: this optimization is well known and yields the gibbs states with only relevant observables in the hamiltonian perturbations, namely terms with field modes . The requirement that visible predictions be invariant from σ leads to an RG. This matches the previous RG in the sense that the procedure is technically equivalent to lowering the cutoff Λ to in perturbative expansions.
But our calculation also tells us that we may neglect features whose distinguishability scales poorly with the field-value precisions and , hence justifying the use of effective hamiltonians with low degree polynomials in the fields. For instance, choosing n = 2 selects only quadratic terms as relevant observables, and the Gaussians states as a natural family of effective states (this is a very different argument than the one based on renormalizibility). We note that here , play very different roles than σ because of the differences in asymptotic relevance behaviour, but we could imagine a different type of experimental limitations where more parameters govern the RG besides σ.
7. Distinguishability in QFT
We can use the solution of equation (6) to compute the effective distinguishability D(A) of any perturbation generated by a Hamiltonian term A, defined as the lowest-order approximation of the relative entropy , where are normalized states. Indeed, we have , which can be computed by expressing A in the basis of principal observables around ρ. For instance, in the scalar field example, , where is given above, and . With the standard tools of perturbative QFT, this can be generalized to higher-order expansions of the exponential (while keeping with the lowest order approximation of the relative entropy).
For non-local terms, D has to be made into a density. It may be argued that the unit of volume used to define the density ought to explicitely scale with σ, leading to the distinguishability density , where d is the dimension of space and a restriction of A to a region Σ of volume . A Hamiltonian term may then be said to be relevant in information-theoretic terms if scales as a positive power of σ (for a fixed state). Preliminary calculations indicate that the result is compatible with the Wilsonian analysis classically, but may differ in important ways in the genuinely quantum analysis. This will be analysed in further work.
8. Concluding remarks
We introduced a framework which allows for the definition of effective theories in very general terms, taking into account any measure of distinguishability and any model of experimental limitations. We demonstrated the pertinence of this approach by showing that it naturally contains, as a particular case, the concept of effective theory as defined by the RG of quantum field theory. Further work will explore how varying the assumptions lead to effective theories which differ from the standard QFT framework. For instance, taking the field-value resolutions into account in the interacting context leads to a concept of dressed effective field which depends in principle on the detail of the coarse-graining channel and distinguishability metric.
Most interestingly, the fact that this approach is not at all tied to the standard QFT formalism means that it can in principle be applied to completely different types of theories, as well as very different models of experimental limitation (not necessarily related to scale). For instance, the case of loop quantum gravity [19], which proposes a class of background-free quantum field theories, could provide interesting applications.
This approach can also be naturally applied to spin lattice systems, so as to derive effective field theories describing their large scale properties. In standard approaches, a spins system is connected to an effective QFT through symmetry arguments (observables in the discrete and continuous descriptions are paired by identifying the group transformations they generate). Our approach provides a more bottom-up approach, where the effective QFT can in principle be derived through the mechanism by which it emerges, i.e., through the identification of the degrees of freedom which are effectively ignored. This can be performed numerically by solving equation (6) using techniques such as matrix product states.
Finally, the framing of effective QFT in this fundamentally information-theoretical approach elucidates precisely what information is being destroyed when the theory is renormalized. A concrete way of quantifying this is proposed in the last section. This may provide a first step towards generalizing Zamolodchikov's ctheorem [9, 20], which could in turn provide new techniques for the general classification of effective QFTs.
Acknowledgments
Helpful discussions with numerous people are most gratefully acknowledged: a partial list includes John DeBrota, Andrew Doherty, Jens Eisert, Steve Flammia, Jutho Haegeman, Gerard Milburn, Terry Rudolph, Tom Stace, Frank Verstraete, and Reinhard Werner. This work was supported by the ERC grants QFTCMPS and SIQS and by the cluster of excellence EXC 201 Quantum Engineering and Space-Time Research. We acknowledge support by Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of Leibniz Universität Hannover.
Appendix A.: Classical Gaussian states
We want to find the action of on G(f). By directly using its definition as adjoint of , we find that for any linear classical observables f and g
But note that, using
we have
By comparing the two expressions, we obtain
Since this is true for all g, then
or
and hence
Note that
Hence, defining , we conclude that
Observe that . Hence H is symmetric in the scalar product . Let fk be am orthonormal basis of eigenfunction of H:
(Note that it may be convenient to consider a complex eigenbasis. Hence we complexify this real Hilbert space in the obvious way.)
We obtain the eigenfunctions of by differentiating G(f) with respect to the basis functions fk. Indeed, let denote the functional derivative in the direction of fk, i.e., for any functional Z(f),
then we obtain
We have
where the primed derivatives are relative to f'.
The functions form an orthogonal basis of a representation of the symmetric Fock space built from the test functions, with scalar product . One can think of as the vacuum. The other eigenfunctions are obtain by acting on it with the creation operators for the 'mode' fk. The commutation relations are
Explicitly
Appendix B.: Perturbation theory
This pictures allows one to find the principal observables around non-Gaussian states by using perturbation theory. The trick is to express the information metric with respect to the perturbed state through its kernel K expressed in that Fock space. This allows one to write also the map , for the perturbed state , also perturbatively as an operator in Fock space. The eigenvalue problem can then be formulated and computed to any degree using standard Feynman diagram techniques.
Let us defined the generating functions
Differentiating this functional yields the components in the Fock basis of an operator K that is the kernel of the metric defined by with respect to the unperturbed metric ρ:
where denotes derivation with respect to f' in the direction fk. Similarly,
is the generating function of the identity operator.
It will be convenient also to consider the Fock space defined from the metric at point , with the same vacuum; but with creation operators defined by
where
These are indeed orthogonal since
where we used the fact that H is positive in terms of the scalar product .
In this basis we express
We can compute from our previous results that
Hence we find that K and J are related by
Also, the channel naturally maps between the two Fock spaces, represented as the operator E generated by
Finally, the unknown is the operator R representing as
It is defined by the relation
for all f and g. Expanding this relation in the respective Fock basis, we obtain
or
If we expand
we have
and
Expanding ER, we obtain
In order to compute the first order corrections to the unperturbed eigenvalue problem, we need the generating function of the perturbation
Since E is diagonal in the Fock basis, we only need to worry about K1 and L1 directly.
The generating function of the operator K1 is
We will consider an interaction of the form
where the functions fx possibly form a different basis than fk. This generates the state where
We have
where
When differentiated, this free partition function yields Feynman diagrams with no propagation between the vertices associated with f or f' respectively.
As an example, we performed this calculation for the state corresponding to the euclidean form of the Klein–Gordon scalar field theory with interaction (in an arbitray number of spatial dimensions). The channel defined an operator X which performs a Gaussian convolution over scale σ as in the quantum example in the article. We also use .
The free theory yields the Gaussian state defined by the operator A, inverse of . Given that it commutes with the operator X defined by
with d the number of dimensions, H is codiagonal with X and H which are all self-adjoint in the scalar product.
Using the eigenfunctions of A:
with
we obtain the eigenvalues of H:
The normalized unperturbed 'one-particle' principal observables are . Note that we used complex eigenfunctions because it makes the calculations much simpler. The degeneracy between the k and eigenfunctions allows one to recover the real eigenfunctions by linear composition of the complex ones.
The interaction is defined as above using the improper basis
Hence, besides , we find
and
The principal observables around , obtained by perturbation from the one-particle observables for the Gaussian state ρ, are where the only non-zero components of the first order correction are
where we omitted the next two terms which are obtained by rotating l1, l2 and l3.
Appendix C.: Quantum Gaussian states
We use the notation introduced in the paper. In order to compute the metric explicitly, we need the commutation relation
where we are now working in a complexified phase-space so as to accommodate imaginary time evolutions. Indeed, we need the group of complex matrices associated with the Gaussian state such that
The metric (in the Heisenberg picture) is . This group is symplectic: and leaves the state invariant: .
Using these properties, we find that the polynomials generated by are orthogonal with respect to the metric when they are of different degrees in the canonical observables. This follows from the fact that where . Indeed, the derivatives of the integrand evaluates to zero at whenever the number of differentiations with respect to f is not equal to the number of differentiations with respect to g.
Moreover, since the channel maps to a Gaussian state defined by the new matrix , we find that . Finally, using the definition of as adjoint of , we obtain
This imply that a polynomial of order n generated by GA is mapped by to a polynomial of order n generated by GB, which is then mapped back to a polynomial of order n generated by GA.
Therefore, we conclude that the principal observables are polynomials generated by GA. Finding the exact polynomials of a given degree can be done for each order independently, which is a finite-dimensional problem.
This statement is in fact true for all quantum generalizations of the Fisher information metric. Classically, the Fisher information metric is characterized as the only metric on the manifold of probability distributions which contracts under the action of any stochastic map. In the quantum case, Petz and Sudár [21] characterized all contractive metrics. They are defined by an operator monotone function such that for all . An operator monotone function has the property that, when applied to operators via functional calculus, whenever (i.e., is positive). The function θ defines the kernel via its inverse as follows:
where and for any matrix A. It is straightforward to adapt the above argument to this general form.