Paper The following article is Open access

A measure of majorization emerging from single-shot statistical mechanics

, , and

Published 2 July 2015 © 2015 IOP Publishing Ltd and Deutsche Physikalische Gesellschaft
, , Citation D Egloff et al 2015 New J. Phys. 17 073001 DOI 10.1088/1367-2630/17/7/073001

1367-2630/17/7/073001

Abstract

The use of the von Neumann entropy in formulating the laws of thermodynamics has recently been challenged. It is associated with the average work whereas the work guaranteed to be extracted in any single run of an experiment is the more interesting quantity in general. We show that an expression that quantifies majorization determines the optimal guaranteed work. We argue it should therefore be the central quantity of statistical mechanics, rather than the von Neumann entropy. In the limit of many identical and independent subsystems (asymptotic i.i.d) the von Neumann entropy expressions are recovered but in the non-equilbrium regime the optimal guaranteed work can be radically different to the optimal average. Moreover our measure of majorization governs which evolutions can be realized via thermal interactions, whereas the non-decrease of the von Neumann entropy is not sufficiently restrictive. Our results are inspired by single-shot information theory.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Statistical mechanics is a corner-stone of modern physics. Many of its basic paradigms and mathematical methods were set in an era where the experimental abilities were much more limited and modern information theory not developed. Accordingly there is currently significant momentum in investigating the theory's foundations in the quantum and nano regimes, see e.g. Jarzynski (1997), Lloyd (1997), Gemmer and Mahler (2004), Allahverdyan et al (2004), Linden et al (2010), Toyabe et al (2010), Brandão et al (2011), Jevtic et al (2012) to mention but a few recent contributions. We here derive an alternative type of statistical mechanics from scratch. Our approach is inspired by recent results in information theory (Renner and Wolf 2004, Renner 2005) and builds on (Dahlsten et al 2011, Rio et al 2011, Aberg 2012, Horodecki and Oppenheim 2013). We argue this approach is both significantly more general than the standard theory and addresses questions more relevant to modern experiments.

It is more general in that we will not assume that the states of systems of interest are thermal, but rather just that there is a heat bath which when interacting with a system gradually takes that system towards a thermal state. Thus the system of interest is not necessarily in equilibrium. In fact we will allow for any probability distribution over energy levels. We do in particular not assume that the system under consideration is large or that internal correlations are negligible. This makes the approach significantly more relevant to modern experiments where small subsystems can be addressed individually and in time-scales faster than the thermalization time.

A key difference regarding which questions are addressed is that we focus not on averages of distributions as in standard statistical mechanics. Instead we ask, for any given single run of an experiment, which threshold values are guaranteed to be exceeded, or more generally guaranteed to be exceeded up to some probability epsilon, not necessarily small. This is referred to as the single-shot paradigm, as opposed to the average paradigm. This distinction is important when distributions of quantities have a significant spread around the average, as is often the case for small systems.

To see why we choose the single-shot paradigm, consider work extraction from a system. Work is a particularly important quantity, appearing in the first and second laws of thermodynamics and of crucial importance in the context of engines. As usually this is the case, let there be more than one way to extract work, e.g. different ways of changing the Hamiltonian of the system from which work is to be extracted. Say for concreteness that there are two different strategies: strategy 1 (S1) and strategy 2 (S2). Let S1 (S2) be associated with probability distributions over extracted work w denoted by ${p}_{1}(W)({p}_{2}(W))$. Suppose that the averages are equal, i.e. $\langle W{\rangle }_{S1}=\langle W{\rangle }_{S2}$, but ${p}_{1}(W)$ has no spread around the average, whereas ${p}_{2}(W)$ has a significant spread. Are these protocols now equally 'good', as one might think by looking at the averages? This is certainly not the case in general. Suppose that there is a threshold for W, W* that needs to be exceeded. Such thresholds often exist as e.g. an activation energy for some process, or a band-gap to jump. Suppose moreover, to make this example interesting, that $\langle W{\rangle }_{S1}=\langle W{\rangle }_{S2}\gt {W}^{*}$. Now with S1 we will indeed achieve the threshold with probability 1, but with S2 the probability of exceeding the threshold can be arbitrarily small, as there may be a small probability of significantly exceeding the threshold but a large probability of just about failing to achieve it.

If we instead of the average considered the work guaranteed up to probability epsilon, writing this as ${W}_{S}^{\varepsilon }$, where S is the strategy, we see that ${W}_{S1}^{\varepsilon }=\langle W{\rangle }_{S1}\gt {W}^{*}\;\forall \varepsilon \in [0,1]$ whereas ${W}_{S2}^{\varepsilon }\lt {W}^{*}$ for all epsilon smaller than whatever the probability of being below the threshold is. This example demonstrates that the single-shot quantity ${W}_{S}^{\varepsilon }$ does, in contrast to the average $\langle W{\rangle }_{S}$, make it clear that the two protocols perform very differently. We find this example most interesting if one considers different epsilon and not only $\varepsilon =0$.

In this article we derive an expression concerning the optimal work ${W}_{S}^{\varepsilon }$ for various initial and final conditions. More specifically we consider a system with an initial Hamiltonian Hi and density matrix ρ, and a given final Hamiltonian Hf and density matrix σ. We only consider states ρ and σ diagonal in the energy basis. The experimenter may choose from a set of possible strategies S, which are arbitrary combinations of infinitessimal changes in the Hamiltonian, and interactions with a thermalizing heat bath associated with temperature T. The work guaranteed to be exceeded with a failure probability up to epsilon is then written as ${W}_{S}^{\varepsilon }(\rho ,{H}_{i}\to \sigma ,{H}_{f})$. As the main technical result of this paper we derive an expression for the optimal guaranteed work: ${W}^{\varepsilon }(\rho ,{H}_{i}\to \sigma ,{H}_{f})={\mathrm{max}}_{S}{W}_{S}^{\varepsilon }(\rho ,{H}_{i}\to \sigma ,{H}_{f})$. We show it is given—if we suppress certain details to be specified later—by

where ${\mathsf{M}}({\mathsf{G}}(\rho ,{{\mathsf{H}}}_{{\mathsf{i}}})\parallel {\mathsf{G}}(\sigma ,{{\mathsf{H}}}_{{\mathsf{f}}}))$ is a measure of how much ρ majorizes σ. This measure of majorization emerges from our considerations. A way of calculating the deterministic work for the zero-risk case in terms of diagrams has been given in Horodecki and Oppenheim (2013). In this case the results coincide. In Aberg (2012) deterministic work is defined as work that will be extracted, no more no less, with probability 1. $(\epsilon ,\delta )$-deterministic work W means the work will be in the interval $[W-\delta ,W+\delta ]$ up to an error probability of epsilon. Here in contrast we have considered guaranteed work. The difference between guaranteed and deterministic work can be most easily seen for epsilon and δ both being 0. Then having non-zero deterministic work necessitates no spread in the distribution whereas guaranteed work means that the spread lies above the wanted threshold. One can get an upper bound for the deterministic work by the guaranteed work, but in general they are different objects.

In standard thermodynamics it is the free energy difference $\Delta F=\Delta (U-{{TS}}_{\mathrm{vN}})$ which determines the optimally extractable work, and moreover gives a criterion for which state transformations are realizable by interactions with a heat bath, via $\Delta F\leqslant 0$, as can be shown to be true for many reasonable models of thermalization. We argue however that ${\mathsf{M}}$ should be the central quantity of statistical mechanics, by virtue of: (i) characterizing optimal guaranteed work and (ii) providing a tight condition for which evolutions are consistent with our thermalization model, as opposed to $\Delta F\leqslant 0$ which we show is necessary but not sufficient. These statements will be made precise later in this Letter. We call ${\mathsf{M}}$ the relative mixedness. In certain limits ${\mathsf{M}}$ reduces to differences in entropy of so-called single-shot entropies, which in turn in the asymptotic i.i.d. limit (${\rho }^{\otimes n}$, $n\to \infty $) reduce to the von Neumann entropy ${S}_{\mathrm{vN}}$. But in general the relative mixedness of two states can be very different to the standard free energy difference $\Delta F$.

We go on to make use of the results relating to the relative mixedness to formulate the laws of thermodynamics in the single-shot paradigm. The first law is modified to be about guaranteed work rather than average work. Several versions of the second law are all modified in important ways. Apart from the already mentioned replacement of free energy decrease, the optimal extractable work turns out not to be a function of state but a relative notion between two states. The relative mixedness acts as a unifying feature which means that the new laws nevertheless have a simple structure.

As there are strong connections between the structure of entanglement theory and that of thermodynamics, we moreover consider the impact on entanglement theory, showing how to quantify entanglement as a relative notion between two states using relative mixedness rather than as a state function given by the von Neumann entropy.

Results

Existing results

We begin with briefly reviewing key results that we shall later recover as special cases of our expression. (This is thus not an exhaustive list of all previous results). The results concern extracting work in the presence of a heat bath at temperature T . The details of the models of work extraction in the different papers are not a priori identical, but we shall recover the same expressions within the model here.

In Dahlsten et al (2011) an n-cylinder Szilard engine was considered and the following expression derived:

Equation (1)

Here ${W}^{\varepsilon }$ is the work that can be extracted in a process with maximum probability of failure epsilon. ${H}_{\mathrm{max}}^{\varepsilon }$ is the smooth max entropy of the density matrix representing a work-extracting agent's initial knowledge about the state of the working medium. This is defined as ${H}_{\mathrm{max}}^{\varepsilon }(\rho )=\mathrm{log}({\mathrm{rank}}^{\varepsilon }(\rho ))$, with ${\mathrm{rank}}^{\varepsilon }(\rho )$ the number of non-zero eigenvalues minimized over all states within epsilon trace distance of ρ. (Actually there is an alternative definition as well but they are both known to coincide up to an additive $\mathrm{log}\frac{1}{\varepsilon }$ term, so for simplicity we focus on one definition here.) T is as mentioned above the temperature of the heat bath, and k Boltzmann's constant. ${H}_{\mathrm{max}}^{\varepsilon }(\rho )$ reduces to the von Neumann entropy in the in the i.i.d. limit, i.e., when $\rho ={\tau }^{\otimes n}$, $n\to \infty $ and $\varepsilon \to 0$. Physically this corresponds to systems composed of very large numbers of identical and uncorrelated subsystems.

A key result obtained independently in the more recent papers (Aberg 2012, Horodecki and Oppenheim 2013) is that given an initial state ρ and a final thermal state ${\rho }_{T}$ over the same energy levels, the work that can be extracted given access to a heat bath of temperature T, and with up to epsilon failure probability is:

Equation (2)

where ${D}_{0}^{\varepsilon }(\rho \parallel {\rho }_{T})$ is the epsilon-smooth relative entropy of order 0 (see Datta 2009). In Aberg (2012) ρ is taken to be diagonal in the energy eigenbasis and in the a priori distinct set-up in Horodecki and Oppenheim (2013) the state if not already diagonal in the energy eigenbasis may be replaced by the corresponding diagonal (decohered) state without changing the expression for the extractable work (in Horodecki and Oppenheim 2013 also the probabilistic work for the opposite process was given and the deterministic work for arbitrary (initially energy-diagonal) state conversion). The RHS of equation (2) reduces to $W={kT}\mathrm{ln}(2)D(\rho | | {\rho }_{T})$ for the standard relative entropy in the asymptotic i.i.d.(von Neumann entropy) regime. That latter expression is well-established, see e.g. Donald (1987). Equation (2) reduces to equation (1) in the case of degenerate energy levels, as shown in Aberg (2012). In this present article we impose no restrictions on the energy spectra or occupation probabilities, they may take arbitrary form independently of one another.

The model for work extraction

Our work extraction model can be thought of as a game with simple but minimal rules. (It will nevertheless not be trivial to analyse as there is a multitude of different strategies one may choose for the task of work extraction given the initial and final conditions.) The model is inspired by Alicki et al (2004) and very similar to that used in Aberg (2012). There are three systems and an implicit work-extraction agent representing the external experimenter who can control certain parameters. As depicted in figure 1(b) one system is the working medium, another is a heat bath of temperature T, and the last is the work reservoir.

Figure 1.

Figure 1. (a) Abstract depiction of the set of states, including the initial state $\rho $ and final state $\sigma $. Each state is associated with a set of energy levels and occupation probabilities. We derive an expression for how much work one can optimally extract with a maximum probability of failure of $\varepsilon $ for any such $\rho $ and $\sigma $. This quantity is called ${W}^{\varepsilon }(\rho ,{H}_{i}\to \sigma {H}_{f})$. Only in certain limits does it reduce to the standard free energy difference. (b) The generic setup we are considering involves three systems: a heat bath at temperature T, a working medium system associated with some initial state $\rho $, and a work reservoir system. One may for instance couple the system to the heat bath and the work reservoir alternately and thereby transfer energy from the heat bath to the work reservoir, at the cost of randomizing the working medium system.

Standard image High-resolution image

The initial and final energy spectra $\{E\}$ and $\{F\}$ of the working medium are arbitrary. The initial and final density matrices of the working medium, ρ and σ, are not assumed to be thermal, they can take any form as long as they are diagonal in the energy basis. This is because we assume, as is non-trivial but standard, that the decoherence time is much faster than the thermalization time (Alicki et al 2004). These initial and final conditions are depicted in figure 1(a).

One of the two elementary processes the agent can compose to build the full strategy is thermalization of the working medium. With thermalization we mean gradual thermalization, i.e. we do not mean that the state after the thermalization process is thermal, but merely that it is nearer to the thermal state than before the process. This is modelled by the probabilities of the energy-levels being transformed by a matrix from the set of stochastic matrices which have the thermal state corresponding to temperature T as the fixed state. This process does not change the Hamiltonian of the working medium. There is by definition no work gain or cost from this process.

The second elementary process is changing the Hamiltonian of the system through shifting an energy level by some chosen amount $\delta E$. One may for example think of moving a magnet or a charge closer to the system as a way of shifting the levels. This may involve a work gain/cost, because if the system occupies the particular energy eigenstate(s) that gets shifted by $\delta E$ this counts as work done on the system. If the system does not occupy the eigenstate that gets shifted there is no work cost. Importantly, we enforce energy conservation by changing the energy of the work reservoir by the same amount ($\delta E$ if the shifted level is occupied, 0 otherwise). As the system's state is in general not fully known, each Hamiltonian-changing step induces a probability distribution over energy transferred to the work reservoir. For example, if level i only is raised by $\delta {E}_{i}$ and the others are stationary the probability of the work reservoir losing $\delta {E}_{i}$ of energy is pi, the probability of occupation of level i, and the probability of the work reservoir not changing its energy is $1-{p}_{i}$. Finally, it is assumed that the experimenter implements Hamiltonian changes without affecting which energy level is occupied. This is justified by the adiabatic theorem which says that it is possible to avoid hopping between levels by shifting them sufficiently slowly. In general this will not be the case but we are interested in fundamental limits and allow the experimenter this level of control.

The agent's choice of how to combine the elementary processes is called its strategy ${\mathcal{S}}$. Any given strategy will in general generate an associated probability distribution over work costs/gains, i.e. of total energy transfers from/to the work reservoir. When strategy ${\mathcal{S}}$ is guaranteed to transfer a certain amount of energy up to probability epsilon we call this the (epsilon-) guaranteed work and denote it by ${W}_{{\mathcal{S}}}^{\varepsilon }$. In a given realization the strategy ${\mathcal{S}}$ may then (with a probability bounded by epsilon) fail to achieve ${W}_{{\mathcal{S}}}^{\varepsilon }$, otherwise we say the work extraction was successful (in achieving ${W}_{{\mathcal{S}}}^{\varepsilon }$).

Relative mixedness gives the optimal guaranteed work

In this section we focus on deriving the optimal amount of work that can be guaranteed to be extracted (up to failure probability epsilon), writing this as ${W}^{\varepsilon }(\rho \to \sigma ):= {\mathrm{max}}_{{\mathcal{S}}}{W}_{{\mathcal{S}}}^{\varepsilon }(\rho \to \sigma )$. The bound we get from these considerations is one of the main results of this paper.

We will show that this is determined by a measure of how much more mixed one state ρ is than another, σ. We call this the relative mixedness and write it as ${\mathsf{M}}(\rho \parallel \sigma )$. As we consider states diagonal in the energy basis, the only relevant information about a state will be its spectrum. For our purposes it will therefore be enough to define the relative mixedness for probability distributions.

Definition 1. Consider two probability distributions $\lambda (x)$ and $\mu (x)$ defined over $x\in {{\mathbb{R}}}^{(\geqslant 0)}$. Let $\lambda (x)\downarrow $ and $\mu (x)\downarrow $ denote these distributions after a (measure-preserving) rearrangement so that they are in descending order. Let the cumulative distribution function associated with a function γ be denoted as

Then the relative mixedness of $\lambda (x)$ and $\mu (x)$ is defined as

where $m\in {\mathbb{R}}$. In words: the relative mixedness of λ and μ is the maximal amount by which one can stretch $\lambda \downarrow $ under the condition that its integral upper bounds the integral of $\mu \downarrow $ at all points.

By the definition of majorization, if and only if ${\mathsf{M}}\geqslant 1$ does (the spectrum of) ρ majorize σ, $\rho \;\succ \;\sigma $. The actual number ${\mathsf{M}}$ can thus be viewed as putting a number to how much ρ majorizes σ.

We shall make use of a powerful insight from Ruch (1975), Ruch and Mead (1976), Mead (1977), who were—to our knowledge—the first to note that the decreasing of the von Neumann entropy might not be a sufficient criterion for characterizing thermodynamical processes and they proposed a criterion based on majorization; this insight is also used in Horodecki and Oppenheim (2013) where they showed this criterion to be necessary and sufficient for a class of quantum operations introduced in Janzing et al (2000). A relation between majorization and thermodynamics has also been noted in Janzing et al (2000), Horodecki et al (2003), Allahverdyan et al (2004), Janzing (2006). The insight bridges a particular gap between information theory and statistical mechanics: the fact that the former does not care about energy. In information theory, the Shannon/von Neumann entropy of a state, $-{\displaystyle \sum }_{i}{\lambda }_{i}\mathrm{log}{\lambda }_{i}$ is independent of the energies of the states involved. As the extractable work should depend on the energy levels involved it follows that it is not expected to be uniquely determined by an entropy.

A key way in which energy enters into statistical mechanics is that in a Gibbs state the probability of any given energy eigenstate with energy E is given by ${p}_{T}(E)=\mathrm{exp}(-\frac{E}{{kT}})/Z$, where Z is the partition function. The insight we adapt from Ruch (1975), Ruch and Mead (1976), Mead (1977) is that we can take this bias into account by what essentially amounts to rescaling the density matrix's eigenvalue distribution by pT(E). After the rescaling the occupation probabilities will turn out to uniquely determine our expression for the extractable work. More specifically, we shall be employing an operation we term Gibbs-rescaling to the eigenvalue spectrum. Consider states with discrete spectra $\{{\lambda }_{i}\}$. We firstly transform the spectrum into the associated step-function. Then we take each block, rescale its height as ${\lambda }_{i}\mapsto {\lambda }_{i}/\mathrm{exp}\left(-\frac{{E}_{i}}{{kT}}\right)$, and its width $l=1\mapsto \mathrm{exp}\left(-\frac{{E}_{i}}{{kT}}\right)$ such that the area of the new block is ${\lambda }_{i}$ as before. We write this operation applied to a density matrix ρ as ${G}^{T}(\rho )$, or ${G}^{(T,H)}(\rho )$ to make the dependence on the Hamiltonian H explicit.

A way of understanding the Gibbs-rescaling is to think of it as splitting events into finer events in such a way that a Gibbs state becomes a uniform distribution, i.e. higher probability events get split into more fine events than those with lower probability. This fine-graining may even be thought of as physically associated with the number of joint states on the system and the heat-bath, with high probability states associated with more joint states on the system plus environment than low probability states.

Having defined the relative mixedness ${\mathsf{M}}(.\parallel .)$ and Gibbs-rescaling ${G}^{T}(.)$ we can now give the main result. This result states that given that the chosen strategy must take an initial state ρ to a final state σ and the initial Hamiltonian Hi to Hf, the optimal work that can be guaranteed up to probability epsilon to be extracted, ${W}^{\varepsilon }(\rho ,{H}_{i}\to \sigma ,{H}_{f})$, is given by the relative mixedness of the Gibbs-rescaled states.

Theorem 1. In the work extraction game defined above, consider an initial density matrix $\rho ={\displaystyle \sum }_{i}{\lambda }_{i}| {e}_{i}\rangle \langle {e}_{i}| $ and final density matrix $\sigma ={\displaystyle \sum }_{j}{\nu }_{j}| {f}_{j}\rangle \langle {f}_{j}| $ with $\{| {e}_{i}\rangle \}$, $\{| {f}_{j}\rangle \}$ the respective energy eigenstates of Hi and Hf. Then for any strategy ${\mathcal{S}}$, ${W}_{{\mathcal{S}}}^{\varepsilon }(\rho ,{H}_{i}\to \sigma ,{H}_{f})\leqslant {W}^{\varepsilon }(\rho ,{H}_{i}\to \sigma ,{H}_{f})$, where

Furthermore an explicit strategy we propose always saturates this bound, provided that the agent can access a single extra two-level system (the catalyst system) which is fixed to be in one of its energy eigenstates with $| \xi \rangle \langle \xi | $ both initially and finally, i.e. $\rho =...\otimes | \xi \rangle \langle \xi | $ and $\sigma =...\otimes | \xi \rangle \langle \xi | $ with the same initial and final Hamiltonian on the catalyst.

Here we give the main arguments for the theorem, a full proof is given in the appendix.

The first claim concerning the relative mixedness expression on the RHS being an upper bound is arrived at from the following line of reasoning. There are two elementary processes and each have the effect of making the state more (or at least not less) mixed according to the relative mixedness measure. Work extraction, by definition, only occurs during a change of the Hamiltonian. In this case the optimal is to only move occupied levels, for which the energy gain is given precisely by ${kT}\mathrm{ln}\left({\mathsf{M}}\left({G}^{(T,{H}_{i})}(\rho )\parallel {G}^{(T,{H}_{f})}(\sigma )\right)\right)$ (see the appendix).

The second claim concerns a universal strategy that we formulate. To illustrate it we now describe a very simple instance: the case of Landauer's bit reset with certainty ($\varepsilon =0$). Here there is a qubit associated with two energy levels E1 and E2 with $H={E}_{1}| 1\rangle \langle 1| +{E}_{2}| 2\rangle \langle 2| $. We demand ${E}_{1}={E}_{2}=0$ at the beginning and at the end, ${\rho }_{i}=1/2| 1\rangle \langle 1| +1/2| 2\rangle \langle 2| $, ${\rho }_{f}=| 1\rangle \langle 1| $. The change in the state is why this is called 'bit reset' (it is often called, ambiguously, bit erasure). Our universal strategy reduces in this simple case to the following: (i) lift both energy levels up by $\Delta E={kT}\mathrm{ln}2$. This costs ${kT}\mathrm{ln}2$ of work with probability 1, (ii) split the levels quasistatically and isothermally such that ${E}_{1}\to 0$ and ${E}_{2}\to \infty $. In this step the Gibbs rescaled distributions are not changed, they are all 'Gibbs-equivalent'. This level splitting actually costs 0 work with probability 1. This can be seen by making use of the powerful Mc Diarmid's inequality (McDiarmid 1989). The key step is to argue that lifting an individual level quasistatically and isothermally gives a probability distribution over work that has arbitrarily small spread around the average. This can be shown by considering a series of discrete lifts of the same size $\Delta E$ with the work cost a random variable for each one. The work cost of one step is independent of that of any other step, because the state is by assumption thermal before each lift (as follows from the process being isothermal and quasistatic). Mc Diarmid's inequality states: Let X1,X2...Xn be independent random variables all taking values in the same set. Call the realized value of Xi xi. Further, let $f({x}_{1},{x}_{2}...)$ be a real-valued function with the property that changing one of the xi only can at most change f by ci. Then for all $\epsilon \gt 0$,

Letting the random variables be the energy transferred to the work reservoir in each step, and f be the total energy transferred, one can with a little effort show that there is indeed no deviation from the mean. We note that Aberg (2012) contains alternative techniques for showing concentration around the mean and that, moreover, in the a priori different setting used in Horodecki and Oppenheim (2013) what amounts to Gibbs-equivalent transforms at zero work cost are also possible. (iii) Finally the system is decoupled from the heat bath and the empty level 2 is moved down to ${E}_{2}=0$ (without any work cost/gain), completing the process.

It is an interesting question how one could generalize our theorem. In the more general case of off-diagonal terms in the energy eigenbasis, one expects entanglement to arise between the work reservoir and the working medium system during the work extraction steps and it is subtle how to define work as the energy of the work-reservoir is not well-defined. One analytically clean approach is to allow decoherence in the systems energy basis a free operation for the experimenter, as in Horodecki and Oppenheim (2013). Then the corresponding decohered state can be inserted into the above expression, implying that the relative mixedness of the decohered state relative to the final state gives a lower bound on the extractable work in the case of off-diagonal terms.

Several existing results are recovered as special cases of theorem 1. Equation (2) above (from Aberg 2012, Horodecki and Oppenheim 2013) and accordingly equation (1) (from Dahlsten et al 2011) are special cases of our main result—see the supplementary information (we reiterate that Horodecki and Oppenheim 2013 uses an a priori distinct set-up and note that the work referred to there is 'deterministic' work associated with deterministic energy transfers to a constantly pure work reservoir and is a priori distinct from the 'guaranteed' work considered here). Equation (2) corresponds to the case where the final state ${\rho }_{T}$ is demanded to have the same eigenspectrum and be a Gibbs state (${\rho }_{T}=\sum {p}_{T}({E}_{i})| {e}_{i}\rangle \langle {e}_{i}| )$). If the initial and final states are both thermal with associated partition functions Zi and Zf the expression reduces to ${kT}\mathrm{ln}\frac{{Z}_{f}}{{Z}_{i}}$ (as is consistent with Aberg 2012, Horodecki and Oppenheim 2013). To our knowledge our paper is the first to give an expression for the optimal work (guaranteed) to be extractable from a general energy-diagonal state to another, with changing Hamiltonians and possibly non-zero risk. In Horodecki and Oppenheim (2013) they also consider how one can calculate the work that can be extracted with arbitrary initial and final Hamiltonian, with either the initial or the final state being thermal and showing how the thermo-majorization condition describes the zero-risk, deterministic work for arbitrary energy-diagonal initial and final states.

Generalized laws of thermodynamics in terms of relative mixedness

As the laws of thermodynamics are centered around the notions of energy, work and entropy, these laws should according to our argument also be formulated in terms of relative mixedness for them to be more suitable beyond the asymptotic i.i.d.regime.

$0{th}$ law: the $0{th}$ law can be stated as: there exists for every thermodynamic system in equilibrium a property called temperature. Equality of temperature is a necessary and sufficient condition for thermal equilibrium. This also holds after our generalization. In particular we are still assuming heat baths that take the working medium closer to a Gibbs thermal state upon interaction.

First law: the first law can be viewed as both asserting the conservation of energy as well as stating that it can be divided into two parts, work and heat, which are normally defined in the description accompanying the first law equation: ${\rm{d}}U={\rm{d}}Q-{\rm{d}}W$. $U=\mathrm{tr}(\rho H)$ is the expected internal energy of the working medium with Hamiltonian H, Q is 'heat' and W 'work'. The associated physical setting is that there is a working medium system which can either exchange energy with another system in a thermal state dubbed a heat bath, or with a work reservoir system normally implicitly assumed to be in some energy eigenstate of its own Hamiltonian. Exchanges of energy with the heat bath are dubbed heat and those with the work reservoir work. This essentially carries over into our approach but with some important subtleties. We assume energy conservation (in every single extraction), as well as allowing for interactions with a heat bath and a work reservoir. Thus the following is respected when the actual energy of the system ${E}_{\mathrm{sys}}$ changes: ${\rm{d}}{E}_{\mathrm{sys}}=-{\rm{d}}{E}_{\mathrm{bath}}-{\rm{d}}{E}_{\mathrm{reservoir}}$. We, more subtly, break ${\rm{d}}{E}_{\mathrm{reservoir}}$ into two parts: ${\rm{d}}{E}_{\mathrm{reservoir}}={\rm{d}}{W}_{{\mathcal{S}}}^{\varepsilon }+{\rm{d}}{E}_{\mathrm{extra}}$. There is the energy transfer which is predictable (up to epsilon probability of failure) in that it corresponds to ${\rm{d}}{W}_{{\mathcal{S}}}^{\varepsilon }(\rho \to \sigma )$ for the infinitessimal state change $\rho \to \sigma $ using strategy ${\mathcal{S}}$. We view anything beyond that, given by ${\rm{d}}{E}_{\mathrm{extra}}$, as heat (even though this energy flows into the work reservoir at first). The idea behind this is that only predicted energy transfer should count as work. One may for example imagine buckets lifting water out of a mine up to a certain height (or as a quantum example an electron excited into the conduction band). The height at which the buckets are tipped into a reservoir is specified in advance. If they go higher than this, the extra potential energy will be transferred to other degrees of freedom associated with the reservoir system, e.g.into movement of the water (or heating of the semi-conductor). We may express the following first law for this approach:

In any given extraction, with probability ${\rm{p}}\geqslant 1-\varepsilon $

Equation (3)

Second law: consider next the so-called Kelvin statement of the second law: no process is possible in which the sole result is the absorption of heat from a reservoir and its complete conversion into work. This does not say anything about processes with a non-zero probability of failure. We show in the appendix that for given states of the working medium A and B respectively, ${W}^{\varepsilon }(A\to B)+{W}^{\varepsilon }(B\to A)\leqslant {W}^{2\varepsilon }(A\to A).$ We call this the triangle inequality. It implies together with the main theorem that all strategies in our game respect the following generalization of Kelvin's second law:

Equation (4)

where ${{\mathcal{S}}}_{i}$ is the choice of strategy in theith step of the cycle. Note that ${W}^{0}(A\to A)=0$ (see main theorem), implying that deterministically no work can be extracted in such a cycle. One may still gain work in a single cycle at the cost of having $\varepsilon \gt 0$ for one or more of the steps.

The second law is also closely related to entropy increasing with time and one may wonder what the corresponding generalization of the statement is. A particular standard expression is that

Equation (5)

where S and $\langle E\rangle $ are the von Neumann entropy and expected energy of a system interacting with a heat-bath with inverse temperature β. (Δ indicates the change in these values during the interaction.) This actually still holds in our more general model; we show this in the supplementary information. However, crucially, equation (5) is not sufficient to guarantee that an evolution $\rho \to \rho \prime $ is realizable through an interaction with a heat bath. Instead it should be replaced by the statement that a state change $\rho \to \rho \prime $ due to a thermalization with a heat-bath at temperature T is possible if and only if

Equation (6)

This is significant as there are processes that respect equation (5) but violate equation (6). A simple example is to consider degenerate energy levels, so that $\Delta \langle E\rangle =0$, and three levels with probabilities ${(1/2\;\;1/2\;\;0)}^{T}\to {(2/3\;\;1/6\;\;1/6)}^{T}$. Then $\Delta S\approx 0.25$ but W0 is negative. Strikingly, such evolutions enable the deterministic violation of Kelvin's second law (if the evolution is stochastic—see supplementary information).

The inequivalence of entropy and majorization has been noted previously in the context of the second law (Ruch 1975, Ruch and Mead 1976). Presumably this has not received more attention to date because in the von Neumann regime this inequivalence disappears. More precisely, if we consider a tensor product of n identical states each with von Neumann entropy S and let $n\to \infty $, then with asymptotically small error we may approximate the spectrum as a uniform probability distribution on the set [0,${2}^{-{nS}}$]. For such distributions the partial orders induced by S and majorization respectively coincide.

We finally make a remark on the mathematical structure that emerges here. We note that the extractable work is no longer a function of state, whereas in standard statistical mechanics the optimal extractable work between two states is given by $\delta {F}_{12}={F}_{2}-{F}_{1}$ with $F=U-{TS}$. Here one must consider the extractable work between two states, assigning a free energy as a state function is not possible. It is not even optimal to go via thermal states in general, i.e., there exist cases where ${W}^{\varepsilon }(\rho \to {\sigma }_{T})+{W}^{\varepsilon }({\sigma }_{T}\to \sigma )\lt {W}^{\varepsilon }(\rho \to \sigma )$.

Very recently it has been argued that our generalized formulation of the second law should be replaced with a slightly weaker condition (Brando et al 2015). As this appeared after our paper on the arXiv we defer discussion of the relation between these papers to later work. In between this paper appearing on the arXiv and being published several other related, interesting and relevant contributions have appeared, including Faist et al (2012), Gour et al (2015), Lostaglio et al (2015).

Relative mixedness as entanglement measure

The structures of entanglement theory and thermodynamics are closely linked and often considered in connection with one another, see e.g. Plenio and Vedral (1998). We now consider the implications of our results for entanglement theory. This section demonstrates that relative mixedness is natural to use in quantum information theory also outside of thermodynamical contexts. It is customary to quantify entanglement via entropy, in particular the standard measure of entanglement of a bipartite pure state ${\rho }_{{AB}}$ is the von Neumann entropy of the reduced state, $S({\rho }_{A})=S({\rho }_{B})$. This is called the entanglement entropy. However there is good reason to think that, as we have argued in the case of statistical mechanics, entropy should be replaced with relative mixedness also in the context of entanglement theory. We propose a notion of relative entanglement between two states ${\rho }_{{AB}}$ and ${\sigma }_{{AB}}$ which is quantified as the (logarithmic) relative mixedness of the reduced states: ${\mathrm{log}}_{2}\;{\mathsf{M}}({\sigma }_{A}\parallel {\rho }_{A})$.

This has the following appealing operational meaning. Consider the Bell state ${| {\phi }^{+}\rangle }_{{AB}}:= \frac{1}{\sqrt{2}}({| 0\rangle }_{A}{| 0\rangle }_{B}+{| 1\rangle }_{A}{| 1\rangle }_{B})$. Consider two arbitrary finite-dimensional bipartite pure states ${\rho }_{{AB}}$ and ${\sigma }_{{AB}}$. How many such Bell pairs are needed to transform ${\rho }_{{AB}}$ to ${\sigma }_{{AB}}$? More specifically, for what condition on ni and nf is the LOCC (local operations and classical communication) conversion ${\rho }_{{AB}}\otimes {(| {\phi }^{+}\rangle {\langle {\phi }^{+}| }_{{AB}})}^{\otimes {n}_{i}}\to {\sigma }_{{AB}}\otimes {(| {\phi }^{+}\rangle {\langle {\phi }^{+}| }_{{AB}})}^{\otimes {n}_{f}}$ possible? The answer is that this is possible if

(We prove this in the appendix, making heavy use of the results of Nielsen 1999 and the setting of Buscemi and Datta 2011).

As a very simple example, for $| \psi \rangle =\alpha | 00\rangle +\beta | 11\rangle $ (and $\alpha \geqslant \beta $) and $| \phi \rangle ={| {\phi }^{+}\rangle }_{{AB}}$ one finds ${\mathrm{log}}_{2}{\mathsf{M}}({{Tr}}_{B}| \psi \rangle \langle \psi | | | {{Tr}}_{B}| \phi \rangle \langle \phi | )={\mathrm{log}}_{2}\;\left(2\parallel \alpha {\parallel }^{2}\right)$. This takes values between 1 ($\alpha =1$) and 0 ($\alpha =\frac{1}{\sqrt{2}}$).

Acknowledgments

We gratefully acknowledge discussions with J Aaberg, J Baez, B Fong, P Perinotti, J Vicary, M Horodecki and J Oppenheim, as well as support from the National Research Foundation (Singapore), the Ministry of Education (Singapore), the Swiss National Science Foundation (grant no. 200020-135048), the Swiss National Centre of Competence in Research QSIT, the European COST Action on Quantum Thermodynamics, the EU Integrating Project SIQS, the European Research Council (grant no. 258932) and the EU collaborative project TherMiQ (Grant agreement No. 618074). DE is grateful for the hospitality of the Clarendon Laboratory, University of Oxford, whilst undertaking part of this work. This research was partly carried out in connection with DE's Master's Thesis at ETH Zurich.

Appendix

The appendix is structured in the following manner. A: the work extraction game, B: upper bounding the extractable work, C: the universal strategy that achieves the bound, D: implications for the second law, and E–G: properties of the relative mixedness.

Appendix A.: The work extraction game

In this section we define the setting more carefully, and derive certain lemmas which shall be needed for the later sections.

A.1. Combining energy and occupation probabilities into one distribution: Gibbs rescaling

There are two central pieces of information about the system, the energy eigenvalues, and their occupation probabilities. We shall find it very powerful to follow Ruch (1975), Ruch and Mead (1976), Mead (1977) and combine them into one object, the Gibbs-rescaled distribution.

Consider states with discrete spectra $\{{\lambda }_{i}\}$. We firstly transform the spectrum into the associated step-function. Then we take each block, rescale its height as ${\lambda }_{i}\mapsto {\lambda }_{i}/\mathrm{exp}\left(-\frac{{E}_{i}}{{kT}}\right)$, and its width $l=1\mapsto \mathrm{exp}\left(-\frac{{E}_{i}}{{kT}}\right)$ such that the area of the new block is ${\lambda }_{i}$ as before. We write this operation applied to a density matrix ρ as ${G}^{T}(\rho )$. It is depicted in figure A1 . Gibbs rescaling can, as will prove useful in later proofs, be written out in the language of continuous functions in the following manner:

Figure A1.

Figure A1. Gibbs rescaling: the width of each block k corresponding to the level k after rescaling is given by $A(k)=\mathrm{exp}(-E(k)/{kT})$, while its height is $\lambda (k)/A(k)$ so that its area is $\lambda (k)$, where $\lambda (k)$ is the occupation probability of the level and $E(k)$ its energy eigenvalue.

Standard image High-resolution image

Definition 2 (Gibbs rescaling). Consider a density matrix $\rho ={\displaystyle \sum }_{i=1}^{n}{\lambda }_{i}| {e}_{i}\rangle \langle {e}_{i}| $ with eigenvalues ${\{{\lambda }_{i}\}}_{i=1}^{n}$ and take the energy eigenstates of the system to be $\{| {e}_{i}\rangle \}{}_{i=1}^{n}$ with energies ${\{{E}_{i}\}}_{i=1}^{n}$ respectively. There is an associated step function for the spectrum, $\lambda ({xn})={\lambda }_{\lceil {xn}\rceil }$ where $x\in (0,1]$. Similarly there is an energy step function $E({xn})={E}_{\lceil {xn}\rceil }$ where $x\in (0,1]$. The Gibbs rescaling associated with temperature T combines $\lambda ({xn})$ and $E({xn})$ to a new function ${G}^{T}(y)$ implicitly defined by

It follows that ${G}^{T}(y)$ is defined on $(0,Z]$, with $Z={\displaystyle \sum }_{j=1}^{n}\mathrm{exp}\left(-\displaystyle \frac{{E}_{j}}{{kT}}\right)$ the partition function. Moreover ${G}^{T}(y)$ is a probability distribution satisfying ${\displaystyle \int }_{0}^{Z}{G}^{T}(y){\rm{d}}y=1$.

A.2. Thermalizations

We now turn to how interactions with the heat bath, thermalizations, act on the state of the system. Roughly speaking these take the density matrix closer to the associated Gibbs state, similar statements can be found in Ruch (1975), Ruch and Mead (1976), Mead (1977) (especially see section 4 of Ruch and Mead 1976, where also a different argument is given for the result below concerning thermalizations). As already mentioned the thermalization is taken to only change occupation probabilities and not energy eigenvalues. We take the thermalization to act as a stochastic process on the energy eigenstates, in that the probability of occupying a given energy state, $P(i)$, becomes $P\prime (i)={\displaystyle \sum }_{j}P(j\to i)P(j)$ where the summation is over all eigenstates, $P(j\to i)$ is a transition probability, and $P(j)$ an occupation probability (before the interaction with the heat bath). This can equivalently be written as $\vec{P\prime }=B\vec{P}$ where B is a stochastic matrix (entries are probabilities and columns sum to 1).

Not every stochastic matrix B is allowed however. The Gibbs state (associated with temperature T) is taken to be invariant under a thermalization. Consider the implications firstly for the fully degenerate case of all energies being the same. In this case the Gibbs state is the uniform distribution. The only stochastic matrices that leave the uniform distribution invariant are bistochastic ones (rows also sum to 1). Thus in the fully degenerate case B must be bistochastic. We see no reason to impose further restrictions, so any such B is allowed.

Consider secondly the non-degenerate case. Here it is again convenient to use the Gibbs rescaled distribution. Note that the Gibbs state becomes uniform after the Gibbs rescaling. Thus one may hope that a thermalization, i.e.a Gibbs state preserving stochastic matrix on the occupation probabilities, acts as a bi-stochastic matrix on the Gibbs-rescaled distribution, and we now show that is indeed the case.

Before considering the general case, we look at a simple example of a two-level system.

Let B be the stochastic matrix6 defined by the transition-probabilities, i.e:

The stochastic matrix should leave the thermal state invariant:

where ${e}^{(\prime )}(i)/Z=\mathrm{exp}(-{E}^{(\prime )}(i)/({kT}))/Z$ is the Gibbs state (which should be invariant as the energy does not change) and $Z=e(1)+e(2)$.

Look at what happens with $e(1)=2$ and $e(2)=1$. For the Gibbs rescaling this means that $P(1)\to P(1)/2$ on the length 2 and $P(2)\to P(2)$ on the length 1. We can split the first level into two parts (in our mind) and consider new levels $(P({1}_{1}),P({1}_{2}),P({2}_{1}))=P(1)/2,P(1)/2,P(2)/1)$ all having the same length after Gibbs rescaling. For the thermal state this means:

The transition matrix becomes:

which is still stochastic, because the initial matrix was. Since the thermal state has to be invariant under the action of this matrix and the thermal state in this case is proportional to the identity, it is straightforward to check that the matrix has to be bistochastic (rows and columns sum to 1).

For the general case consider dividing the Gibbs-rescaled distribution into fine blocks such that all fine blocks have the same width w. Let N be the number of fine blocks. (As the maximum support is given by the partition function Z we have $w=Z/N$). Let Nk be the number of fine grained blocks associated with level k, such that ${\displaystyle \sum }_{k=1}^{n}{N}_{k}=N$. Each energy level is associated with one block only labelled by k. Each lth fine block is associated with a level kl.

Fine blocks associated with the same energy level k must all have the same height, given by $P({k}_{l})/e({k}_{l})$ (where $e({k}_{l})=\mathrm{exp}(-E({k}_{l})/{kT})={{wN}}_{{k}_{l}}={{ZN}}_{{k}_{l}}/N$ is the total width of the level kl after Gibbs rescaling. See the comment after the definition of Gibbs rescaling 2). Let $\vec{f}$ contain the N heights of the fine blocks, with $P({k}_{l})/e({k}_{l})=P({k}_{l})N/({{ZN}}_{{k}_{l}})$ as its lth entry. Now when the occupation probabilities transform under B, $\vec{f}$ undergoes an associated transform. We will argue it is given by a matrix F whose entry in the l-th row and m-th column is given by

Equation (A.1)

To see this note firstly that ${P}_{i}^{\prime }={\displaystyle \sum }_{j}{B}_{{ij}}{P}_{j}={\displaystyle \sum }_{j}\frac{{N}_{j}}{{N}_{j}}{B}_{{ij}}{P}_{j}$, and recall that ${f}_{l}^{\prime }={P}_{{k}_{l}}^{\prime }N/({{ZN}}_{{k}_{l}})$. Thus

Equation (A.2)

Equation (A.3)

Equation (A.4)

Equation (A.5)

As Bij and N are non-negative real numbers F has non-negative real entries only. To see that the columns sum to 1 so that F is a stochastic matrix, note that the column sums are the same as for B which is stochastic. Moreover as B must leave the Gibbs state invariant, and this is a uniform distribution after the Gibbs rescaling, F must leave the uniform distribution (or anything proportional to it) invariant. Then for any row i: ${\displaystyle \sum }_{j}{F}_{{ij}}(1/N)=1/N$ so each row of F must sum to 1. Therefore F is a bistochastic matrix. Note that F is additionally restricted, through being defined via B, to keep the heights of fine blocks the same whenever these are associated with the same level.

Accordingly we define interactions with the heat-baths, thermalizations, to act in the following way on the system.

Definition 3 (Thermalization). A thermalization leaves the energy eigenvalues invariant. It acts on the occupation probabilities, i.e. the eigenvalues of the density matrix, as a stochastic matrix. This stochastic matrix leaves the Gibbs state $\mathrm{exp}(\beta H)/Z$ invariant. It follows from this definition and the definition of the Gibbs-rescaled distribution that a thermalization acts on the Gibbs-rescaled distribution as a bistochastic matrix.

A.3. Work extractions

The second elementary process is changing the Hamiltonian of the system through shifting a set of energy levels by some predetermined amount $\Delta E(j)$, where j labels the jth work extraction. This may involve a work gain/cost, because if the system occupies one of the energy eigenstates that get shifted by $\Delta E(j)$ this counts as work done on the system and we write ${W}_{j}=\Delta E(j)$. It is assumed that this entails an energy transfer of $\Delta E(j)$ to the work reservoir system, so that energy is conserved. If the system does not occupy the eigenstate that gets shifted there is no work cost, ${W}_{j}=0$. To reduce the notation later on we will also find it convenient to define the 'logarithmic' work wj s.t. ${W}_{j}:= {kT}\mathrm{ln}{w}^{j}$ (or equivalently ${w}^{j}:= \mathrm{exp}({W}_{j}/{kT})$).

There is thus for each elementary work extraction a probability distribution over work transfer, with two elements, [$p({W}_{j}=0)$, $p({W}_{j}=\Delta E(j))$]. A sequence of work extractions generates a randomly picked sequence of energy transfers to the work reservoir by, e.g. $\{0,0,\Delta E(3),0,\Delta E(5)...\}$. There is an associated vector of 0's and 1's where a 1 as the j-th entry indicates that there was indeed a work transfer of $\Delta E(j)$ in the j-th step. We call this latter vector $\vec{s}$, and the jth entry thereof sj. ${s}_{j}=0$ means that the levels shifted in work extraction step j were not occupied, and ${s}_{j}=1$ means that they were.

From the perspective of someone who learns sj, the occupation probabilities $\{{\lambda }_{i}\}$ change. If ${s}_{j}=1$ one projects the state ρ with projector ${\Pi }_{\mathrm{shifted}}$ onto the set of levels shifted so that the new state is

If instead ${s}_{j}=0$ one replaces the projector with one onto the levels that were not shifted.

We accordingly represent a work extraction in the following manner:

Definition 4 (Work extraction). We define a work extraction on the first l levels, which are all to get shifted in energy by $\Delta E(j)=-{kT}\mathrm{ln}({w}_{\vec{s}| {s}_{j}=1}^{j})$, while the remaining levels are untouched as follows. Letting ${\Theta }_{U}(y)$ denote the function that is 1 if $y\in U$ and else 0, the new occupation probabilities and energies are given by:

  • In the case when ${s}_{j}=1$ (state of the system is found to be in the levels $(1,\ldots ,l)$):
    where ${\eta }_{\vec{s}| {s}_{j}=1}^{j}={\displaystyle \sum }_{i=1}^{l}{\lambda }_{\vec{s}}^{j-1}(i)$. In this case there is an energy transfer to the reservoir given, in terms of the logarithmic work, by ${w}_{\vec{s}| {s}_{j}=1}^{j}=\mathrm{exp}(\Delta E(j)/{kT})$.
  • In the case when ${s}_{j}=0$ (state of the system is not found to be in the levels $(1,\ldots ,l)$):
    where ${\eta }_{\vec{s}| {s}_{j}=0}^{j}={\displaystyle \sum }_{i=l+1}^{n}{\lambda }_{\vec{s}}^{j-1}(i)$. In this case there is no energy transfer to the work reservoir, i.e. ${w}_{\vec{s}| {s}_{j}=0}^{j}=1$.

This next lemma considers how the work extraction in the preceding definition acts on the Gibbs rescaled distribution. This is also depicted in figure A2.

Figure A2.

Figure A2. Work extraction: the action of the work extraction on the Gibbs rescaled probability distribution can be seen as a stretching by w of the part from which one tries to extract the work ${kT}\mathrm{ln}(w)$, followed by a projection onto either the levels from which one tried to extract work (case ${s}_{j}=1$) or the rest (case ${s}_{j}=0$) followed by a renormalization.

Standard image High-resolution image
Figure C1.

Figure C1. Work extraction algorithm: we choose the last levels such that the sum of their occupation probabilities equals epsilon, then we lift them to infinity, which succeeds with probability $1-\varepsilon $ (step 1). Afterwards we extract the work ${W}^{\varepsilon }$ and get a state which still majorizes the wanted final one (step 2). Thus we can get to the wanted state by doing a thermalization (step 3, see lemma 8).

Standard image High-resolution image
Figure C2.

Figure C2. Isothermal shift: the isothermal shift of the boundary between the levels 2 and 3 in direction 3 leaves p, ${\lambda }_{2}+{\lambda }_{3}$ and ${A}_{2}+{A}_{3}$ invariant, while it increases ${\lambda }_{2}$ and A2. The work cost is 0.

Standard image High-resolution image
Figure C3.

Figure C3. Gibbs-expanding transforms: one can get a state σ out of a state ρ if ${p}_{i}\;\succ \;{p}_{f}$ (with pi the Gibbs rescaled probability distribution of ρ and pf that of σ), by doing the following steps for each final energy level (j): take as many levels (or part of levels) as needed, such that the sum of their occupation probabilities equals the occupation probability of the level j (first and second pictures). Then thermalize and do a work extraction to stretch the distribution to the wanted size (third–to–fourth picture). The final ${A}_{f}(j)=\mathrm{exp}(-E(j)/{kT})$ is bigger than the initial sum, because of ${p}_{i}\;\succ \;{p}_{f}$—therefore it is really a stretching and not a squeezing: the extracted work is at least 0.

Standard image High-resolution image

Lemma 2. Let the levels $\{1,\ldots ,l\}$ be used for work extraction as in the above definition. Let $a\in {\mathbb{R}}$ be the combined width of the blocks of the Gibbs-rescaled distribution corresponding to the levels $\{1,\ldots ,l\}$, i.e. $a={\displaystyle \sum }_{i=1}^{l}{e}^{\displaystyle \frac{-{E}_{i}^{j-1}}{{kT}}}$. Let $x\in (0,{Z}_{j}]$ (with Zj the partition function after step j).

Then following a work extraction in step j, the resulting Gibbs rescaled probability distribution, conditioned on the previous steps on path $\vec{s}$, is given by the following. In the case where ${s}_{j}=1$:

In the case where ${s}_{j}=0$:

Proof. Case ${s}_{j}=1$:

Let the logarithmical work in step j be denoted by $w={w}_{\vec{s}}^{j}$,

let the Gibbs rescaled probability distribution after step j be ${p}^{j}={p}_{\vec{s}}^{j}$ and the one before the step j: ${p}^{j-1}={p}_{\vec{s}}^{j-1}$,

let the occupation probabilities be ${\lambda }^{j}={\lambda }_{\vec{s}}^{j}$ and the sum of the relevant occupation probabilities (as in definition 4): ${\eta }^{j}={\eta }_{\vec{s}}^{j}$

Let $x\in (0,{aw}]\bigcap (0,{Z}_{j}]$ and $b\in (0,\infty )$ such that ${\displaystyle \int }_{0}^{b}\mathrm{exp}(-\displaystyle \frac{{E}_{\lceil \displaystyle \frac{{yn}}{w}\rceil }^{j-1}}{{kT}}){\rm{d}}y=x$.

where the equation $(*)$ follows by definition 4 and the equation $(**)$ follows by definition 2.

One easily sees that ${p}^{j}(x)=0$ for $x\geqslant {aw}$, since then ${\Theta }_{\{0,\ldots ,l\}}(x)=0$ in definition 4.

The proof for the case ${s}_{j}=0$ is analogous. □

This next lemma shows how the partition function changes during a work extraction, as a function of how much the chosen levels are stretched (encoded in w) and how many levels are shifted (encoded in a as described above).

Lemma 3. The partition function Zj immediately after step j is given by:

where $(0,a]$ is the interval on which the Gibbs-rescaled distribution is associated with the stretched levels, and w1 is the logarithmic work extracted if the extraction is successful.

Proof. Let the $(0,a]$ interval be associated with blocks corresponding to the levels $\{1,\ldots ,l\}$ and split the interval $(a,{Z}_{j}]$ into $n-l$ blocks for some n.

out of which the lemma follows. □

A.4. The work extraction game

We consider scenarios where there is an external agent who wants to use thermalizations and work extractions to transform a system with an initial Hamiltonian Hi and density matrix ρ, to a given final Hamiltonian Hf and density matrix σ. In the process the agent will want to keep the energy of the work reservoir as high as possible, in a way that will be made more precise below.

Definition 5 (The work extraction game). There are three systems and a work-extraction agent. One system is the working medium, another is a heat bath of temperature T, and the last is the work reservoir.

The initial energy spectrum $\{E\}$ of the working medium is arbitrary but given. The initial density matrix ρ of the same is diagonal in the energy basis. The final energy spectrum $\{F\}$ and diagonal density matrix σ are also arbitrary but given.

The agent can combine thermalization (defined above) and work extraction (also defined above) in any sequence. This sequence, together with the specifications for each step is called the agent's strategy.

In a single-shot implementation of the strategy there will be a transfer of some energy ν to the work extraction reservoir. Before the extraction the agent must specify W. If $\nu \geqslant W$ and the final state conditioned on $\nu \geqslant W$ is σ, the work extraction is termed successful (or else a failure). The probability of success is called $1-\varepsilon $.

A crucial quantity we will be interested in calculating is the optimal work that the agent can be guaranteed to extract or need to insert. Before defining this quantity mathematically we recall a motivation for being interested in it: consider a scenario where some process is activated only if the the work reservoir energy goes above a certain threshold. One is then interested in whether this threshold is guaranteed to be exceeded. This is as opposed to the standard paradigm of focussing on the average energy increase in the reservoir. This is a key difference between the single-shot paradigm and average paradigm.

Definition 6 (Guaranteed work). For a given strategy S, and a given initial state there is a probability distribution of work transferred to the reservoir, ${p}_{S}({\mathcal{W}})$. We denote the work guaranteed up to a probability of failure epsilon associated with that strategy as ${W}_{S}^{\varepsilon }$, and define it through the equation

For an initial Hamiltonian Hi, density matrix ρ and tolerated probability of failure epsilon, there is a set ${\mathbb{S}}$ of allowed strategies which succeed with probability greater than or equal to $1-\varepsilon $. We denote the optimal work guaranteed (up to failure probability epsilon) for the given initial and final conditions by ${W}^{\varepsilon }(\rho ,{H}_{i}\to \sigma ,{H}_{f})$ and define it as the optimal work over all the allowed strategies in the set:

(Note that this quantity may be negative in the case where work is required to effect the given change in state and Hamiltonian).

A.5. Notation reminder

To assist the reading of the proofs below we collect key notation in the following:

Definition 7 (Notation). We shall use the following notation:

  • a vector with one entry for each of m work extractions (subsequently called 'steps'): ${s}_{j}\;=\;1$: system is in one of the energy levels chosen for work extraction.
  • system is not in one of the states chosen for work extraction $\vec{s}$ is called a path. ${\hat{s}}_{j}$ is the complement of sj: ${s}_{j}=1\;\iff \;{\hat{s}}_{j}=0$ and ${s}_{j}=0\;\iff \;{\hat{s}}_{j}=1$.
  • logarithmical work (${kT}\mathrm{ln}({w}_{\vec{s}}^{j})={W}_{\vec{s}}^{j}$) extracted in step j on path $\vec{s}$.
  • The logarithmical work one extracts in step j if the specified level is occupied.
  • work demanded in order to call the total extraction successful (see definition 5).
  • total logarithmical work demanded in order to call the total extraction successful.

G is the set of successful paths, i.e. those yielding as much work as demanded:

  • probability of picking step j on the path $\vec{s}$. I.e. as in definition 4: ${\eta }_{\vec{s}| {s}_{j}=1}^{j}={\displaystyle \sum }_{i=1}^{l}{\lambda }_{\vec{s}}^{j-1}(i)$, if the chosen energy levels for work-extraction in step j are $\{1,\ldots ,l\}$ and ${\lambda }_{\vec{s}}^{j-1}$ as defined below.
  • total probability of success:${P}_{S}={\displaystyle \sum }_{\vec{s}\in G}{\prod }_{j}{\eta }_{\vec{s}}^{j}$.
  • occupation probabilities after step j if the previous evolution of the system is given by the path $\vec{s}$.
  • Gibbs rescaled probability distribution after step j (before thermalizing) conditioned on the previous steps on path $\vec{s}$.
  • Gibbs rescaled probability distribution after step j (after thermalizing) conditioned on the previous steps on path $\vec{s}$.
  • for $a\lt b$ the interval $(a,b]$ is said to be a block corresponding to a level k, if ${p}_{\vec{s}}^{j}$ is constant on this interval $\forall \vec{s}$.
  • final Gibbs rescaled probability distribution, conditioned on successful work extraction:
  • Bistochastic matrix one chooses after step j by thermalizing the system (this has to be the same for all paths).
  • Energy of the level labelled by x after step j.
  • Step function associated with an interval U:

Appendix B.: Upper bounding ${W}_{{\mathcal{S}}}^{\varepsilon }$

We shall be interested in bounding ${W}_{{\mathcal{S}}}^{\varepsilon }$ given epsilon and the initial and final conditions. We break the calculation into several lemmas which will later be combined to prove the main theorem. But firstly we give the argument for a special case of a more restricted set of strategies, in order to give the reader a sense of why relative mixedness enters as the bounding quantity.

B.1. Instructive special case

Consider zero-risk work extraction such that all levels with non-zero occupation probability are shifted. Note firstly that after a work extraction by $W={kT}\mathrm{ln}(w)$ the height of the Gibbs-rescaled probability distribution is given by ${\lambda }_{i}/\mathrm{exp}\left(-\left(\frac{({E}_{i}-W)}{{kT}}\right)\right)={\lambda }_{i}/\left(\mathrm{exp}\left(-\frac{{E}_{i}}{{kT}}\right)w\right)$, while the width gets stretched by a factor w. So the new Gibbs-rescaled probability distribution is given in terms of the old one as follows: ${p}_{\mathrm{new}}(x)=\frac{{P}_{\mathrm{old}}(x/w)}{w}$ (see lemma 2 for more details).

Thermalization acts as a bistochastic matrix on the Gibbs-rescaled probability distribution and therefore (see Hardy et al 1952) ${\displaystyle \int }_{0}^{l}p(x){\rm{d}}{\text{}}x\geqslant {\displaystyle \int }_{0}^{l}{p}_{\mathrm{thermalized}}(x){\rm{d}}{\text{}}x$, if both distributions are monotonically falling, which we will now assume w.l.o.g. Thus after a thermalization and a work extraction the following holds:

Inductively, after any number of work extractions and thermalizations and total work ${kT}\mathrm{ln}(w)$:

It follows that the maximal logarithmical work given the initial and final Gibbs-rescaled distributions is given by

or equivalently in terms of the cumulative distribution functions ${\mathcal{F}}$,

This is precisely the relative mixedness defined in the main section. In Horodecki and Oppenheim (2011/2013) they also arrive at the same result for the zero-risk case (starting from an a priori different model and using different arguments).

B.2. General case

We now turn to the general case. We combine the two previous lemmas to gain another relation between the Gibbs rescaled distribution at steps j and $j-1$. We shall use this later in an iterative manner to relate the very first and final Gibbs rescaled distributions.

Lemma 4. The Gibbs rescaled probability distributions at steps j and $j-1$ respectively satisfy the relation

with constants ${c}_{\vec{s}| {s}_{j}=1}^{j}=0$ and ${c}_{\vec{s}| {s}_{j}=0}^{j}={{aw}}^{j}-a$.

Proof. Let ${w}_{k}={w}_{\vec{s}| {s}_{j}=k}^{j}$, ${p}_{k}^{j}={p}_{\vec{s}| {s}_{j}=k}^{j}$, ${p}^{j-1}={p}_{\vec{s},t}^{j-1}$, ${\eta }_{k}={\eta }_{\vec{s}| {s}_{j}=k}^{j}$. Let ${c}_{0}={{aw}}_{1}-a$ and ${c}_{1}=0$. Then:

We now use the above to make a statement about the relation between the integrals of the Gibbs rescaled distribution at steps j and $j-1$. We show that the distribution before step j majorizes the distribution after the step, even after the latter has been stretched by the logarithmical work done (w in the case ${s}_{j}=1$, 1 else). This can be seen as a generalization of the inequality: ${\displaystyle \int }_{0}^{l}\frac{{p}_{\mathrm{old}}(x/w)}{w}{\rm{d}}{\text{}}x\geqslant {\displaystyle \int }_{0}^{l}{p}_{\mathrm{new},\mathrm{thermalized}}({\text{}}x){\rm{d}}{\text{}}x$ from the above special instructive case to the case where ${s}_{j}=0$ is also possible.

Lemma 5. Let $j\in \{1,\ldots ,m\}$. Let $l\in (0,{Z}_{j}]$. Let $\vec{s}\prime \in \{0,1\}{}^{m-j-1}$. Define ${\vec{s}}_{1}=({s}_{1},\ldots ,{s}_{j},1,{s}_{1}^{\prime },\ldots ,{s}_{m-j-1}^{\prime })$ and ${\vec{s}}_{0}=({s}_{1},\ldots ,{s}_{j},0,{s}_{1}^{\prime },\ldots ,{s}_{m-j-1}^{\prime })$. Then:

where ${\tau }_{t}^{j}$ is the permutation of any blocks, which maximizes the left hand side, while ${\tau }_{t}^{j+1}$ is the one which maximizes the right hand side.

Proof. Let ${p}_{1}={p}_{{\vec{s}}_{1},t}^{j+1}$, ${p}_{0}={p}_{{\vec{s}}_{0},t}^{j+1}$, ${\eta }_{1}={\eta }_{{\vec{s}}_{1}}^{j+1}$, ${\eta }_{0}={\eta }_{{\vec{s}}_{0}}^{j+1}$, $w={w}^{j+1}$.

where the first equality is exactly lemma 4. In the second equality ${l}_{1}\in (0,\mathrm{min}(a,l)]$ is a value which maximizes the right hand side of the last line and $\tilde{\tau }$ reorders ${\displaystyle \sum }_{\vec{s}\in \{\mathrm{0,1}\}{}^{j}}{p}_{1}$ in descending order in $(0,{aw}]$ and ${\displaystyle \sum }_{\vec{s}\in \{\mathrm{0,1}\}{}^{j}}{p}_{0}$ in $({aw},{Z}_{j}]$. This is possible since p1 and p0 have disjoint support, also for different $\vec{s}$, since a in definition 4 has to be chosen independently of the path. (See lemma 2). This reordering maximizes the last line, thus it is equal to the line above.

After changing variables in the second integral we can translate its bounds by $-{aw}+{l}_{1}$, if we translate the integrand in the opposite direction applying a second permutation. Thus:

Applying any bistochastic matrix $\tilde{B}$ on the probabilities p0 and p1 and reordering in descending order with ${\tau }_{t}^{j+1}$ afterwards, we get (we write $\tilde{B}=B\;\circ \;{({\tau }^{j+1})}^{-1}$ for convenience, then B is again bistochastic):

where the inequality follows out of the inequality ${Bp}\;\succ \;p$ for any bistochastic matrix B and vector p, which is proved in Hardy et al (1952). □

The above lemma is the main ingredient for the first part of the main theorem and the rest of the proof is straightforward.

Theorem (First part of theorem 1 in main body, giving the bound). In the work extraction game defined above, if one is given an initial density matrix $\rho ={\displaystyle \sum }_{i}{\lambda }_{i}| {e}_{i}\rangle \langle {e}_{i}| $ and final density matrix $\sigma ={\displaystyle \sum }_{j}{\nu }_{j}| {f}_{j}\rangle \langle {f}_{j}| $ with $\{| {e}_{i}\rangle \}$, $\{| {f}_{j}\rangle \}$ the respective energy eigenstates and both ρ and σ having finite rank, then the work ${W}^{\varepsilon }$ one can extract with certainty except with epsilon probability respects

Proof. Define ${p}_{{\vec{s}}^{\prime }}^{0}=p$. W.l.o.g. ${\vec{s}}^{\prime }=\{0,\ldots ,0\}$ (the first probability distribution is independent of the path afterwards). Inductively using lemma 5 one gets:

where ${\tau }_{t}^{m}$ is the permutation which maximizes the expression of the right hand side of the first inequality (t stands for 'after thermalizing', while m stands for the mth time one applies lemma 5). Therefore (with ${P}_{S}=1-\varepsilon $):

This proves the first part of the main theorem. □

Appendix C.: Upper bound ${W}^{\varepsilon }$ given by relative mixedness is achievable

This section concerns the second statement of the main theorem (theorem 1). We specify a protocol that achieves the bound given in theorem 1, i.e. it extracts ${W}^{\varepsilon }$ of work with a failure probability no greater than epsilon. The protocol is within the rules of the game (defined in section A). The protocol works for the initial (ρ) and final (σ) states taking the form $\rho =...\otimes | \xi \rangle \langle \xi | $ and $\sigma =...\otimes | \xi \rangle \langle \xi | $, where $| \xi \rangle $ is one of the energy eigenstates of a system with two energy eigenstates in total. This is a small restriction. It amounts to allowing the agent an extra two-level system in a known state, working as a catalyst in the sense that it aids the process but is ultimately unchanged by it.

C.1. Guiding example

Before giving the general protocol it is instructive to consider an example. We begin with a density matrix ϕ with energy eigenvalues ${E}_{i}(j)$, occupation probabilities ${\lambda }_{i}(j)$ and Ai defined by ${A}_{i}(j)=\mathrm{exp}\left(\frac{-{E}_{i}(j)}{{kT}}\right)$. These are given by:

and therefore:

Equation (C.1)

The final state we want to reach is defined through:

and therefore:

Equation (C.2)

With a risk $\varepsilon =\frac{1}{2}$ the work for this game is limited by $W={kT}\mathrm{ln}\left(M\left(\frac{{p}_{i}}{1-\varepsilon }| | {p}_{f}\right)\right)={kT}\mathrm{ln}\left(\frac{4}{3}\right)$. In this example we show how this amount of work can be extracted.

We first want to raise as many energy levels as we can to infinite energy, such that if we succeed (i.e. if these levels are empty and the action therefore costs 0 work) we start with a more known state. Unfortunately the sum of the occupation probabilities of the lowest levels will never yield exactly epsilon, so we need to change this first.

We start by raising the empty energy level to infinite energy, such that even if one mixes it completely with any other energy level it will stay empty. Then we lower the energy of the empty level, while constantly mixing this level with the first one. At the same time we enhance the energy of the first level, such that in total the energy of the work reservoir is unchanged with probability 1 (the details of this action can be found below in definition 9 and the following lemma). We then have:

The lowest two occupation probabilities now sum up to epsilon. We enhance the energy of these two levels by doing a work extraction changing the energy of their states by $\infty $. With probability $1-\varepsilon =\frac{1}{2}$ we get the work 0 and the state:

which in this case is a pure state (the state would not have been pure if we had chosen epsilon to be smaller than $\frac{1}{3}$). With probability $\frac{1}{2}$ we get the work $-\infty $, in which case the work extraction cannot be successful in total. So in the case where the work extraction is successful the above state is the only one we need to consider.

Now we extract the work $W={kT}\mathrm{ln}\left(\frac{4}{3}\right)$ on all the levels. This succeeds with probability 1. The state afterwards is given by:

Again we need two levels where we only have one. Acting again as defined in definition 9 on the first two levels we can get:

The energy of the second level is now too high and we need to lower it by ${kT}\mathrm{ln}(2)$:

The work extracted in this step is in both cases at least 0. So by measuring whether the energy in the work-reservoir has been enhanced by at least $W={kT}\mathrm{ln}\left(\frac{4}{3}\right)$, we get a 'yes' and the wanted final state with probability $\frac{1}{2}$.

C.2. General case

To make the idea clearer we start giving the general algorithm and will then give the proof of the second part of the main theorem, which builds on lemmas proved later on. We assume here that we have at least $n/2$ energy levels with 0 occupation probability, but make sure that in the end these levels have again 0 occupation probability (note, that this does not change the upper bound for the work). We assume that the levels are ordered in descending order of their Gibbs rescaled probability.

Definition 8 (Work extraction algorithm (see figure C1)). Let p and pf be Gibbs rescaled probability distributions of two states ρ and σ, with the same number of levels n.

Let ρ, σ have at least $n/2$ levels with occupation probabilities ${\lambda }_{e}=0$.

Define $W={kT}\mathrm{ln}({M}^{\varepsilon }(p,q))$.

  • (i)  
    Do a work extraction on the levels $k+1,\ldots ,n$ by $-\infty $ (such that their width becomes 0).If there is no k for which $1-\varepsilon ={\displaystyle \sum }_{i=1}^{k}\lambda (i)$:Split the level k for which ${\displaystyle \sum }_{i=1}^{k-1}\lambda (i)\lt 1-\varepsilon \lt {\displaystyle \sum }_{i=1}^{k}\lambda (i)$ (see the corollary to lemma 7, below).
  • (ii)  
    Make a work extraction on all levels by W (i.e. stretch their Gibbs rescaled probability distributions such that it just majorizes the final one).
  • (iii)  
    Thermalize the obtained state to get the final state (up to permutation).
  • (iv)  
    Permute the levels of the obtained state such, that one gets the final state.

Theorem 6 (Bound can be achieved (second part of main theorem)). Let p and pf be Gibbs rescaled probability distributions of two states ρ and σ, with the same number of levels n.

  • Let ρ, σ have at least $n/2$ levels with occupation probabilities ${\lambda }_{e}=0$.
  • Define $W={kT}\mathrm{ln}({M}^{\varepsilon }(p,q))$.

The work extraction algorithm on ρ yields the work W with probability $1-\varepsilon $. If the work extraction is successful, the final state is given by σ with probability 1.

Proof. The work extraction in step 1. succeeds with probability epsilon and if it does not succeed it yields 0 work (else $-\infty $).

  • After step 1. The occupation probabilities are given by ${\lambda }_{1}(i)=\frac{\lambda (i)}{1-\varepsilon }$ for $i=1,\ldots ,k$ (post-selecting on the case, in which the state was not one of the less likelier) and ${\lambda }_{1}(i)=0$ else (if the work extraction 'succeeds' and our algorithm fails). See the corollary to lemma 7, below.
  • After step 2. By the definition of W we have that ${p}_{2}(i)\;\succ \;{p}_{f}(i)$, the extracted work is W. Therefore one can thermalize the obtained state to get the final state ρ (up to permutation) with probability 1 (see lemma 8, below). After the permutation (if the levels have some special physical meaning) we get the final state ρ with probability 1.
  • In total we get the final state ρ with probability 1, if the work extraction succeeds and the extracted work is W with probability $1-\varepsilon $.

To start, we need some algorithm which allows us to shift some probability from one level to the other, if they are in thermal equilibrium. We only want to change these two levels (say j, k), so the sum of their occupation probabilities remains constant (${\lambda }_{j}+{\lambda }_{k}=\mathrm{const}$). Also we hope to be able to do this without needing to do any work, so we keep our total knowledge of these levels constant. To achieve this it seems a good idea to have ${p}_{j}+{p}_{k}=\mathrm{const}$ and constantly thermal equilibrium. This is the guiding idea for the following algorithm. Instead of doing this (rather complicated) proof one also could have assumed that one can split levels in a physical fashion (see the corollary to the next lemma for details). Then one would have got the 'isothermal shift' for free, by simply splitting the level k in two parts and afterwards removing the level j. But this would have been a further assumption. So the following definition and subsequent lemma can also be seen to show it possible (in principle) to achieve a splitting of a level by just having one further empty level a heat bath and a work reservoir (which remains untouched with probability 1).

Definition 9 (Isothermal shift of boundary (see figure C2)). Let $A(j)=\mathrm{exp}\left(\frac{-{E}_{j}}{{kT}}\right)$, where Ej is the energy eigenvalue of the jth level.

Let the levels j, $k=j+1$ have the same Gibbs rescaled probability. We call the limit $n\to \infty $ of the following process an isothermal shift of the boundary between j and k by $w\in \left(-\frac{A(j)}{A(j)+A(k)},\frac{A(k)}{A(j)+A(k)}\right)$ in direction k:

  • (i)  
    Do a permutation, which brings the level j in front and level k as second.
  • (ii)  
    Do a work extraction on level j by:
  • (iii)  
    Do a permutation, which brings the level k in front and level j second.
  • (iv)  
    Do a work extraction on level k by:
  • (v)  
    Do a thermalization totally mixing the two levels j and k and letting all others untouched (i.e. the matrix with entries $1/2$ in $(1,1)$, $(1,2)$, $(2,1)$ and $(2,2)$ and ${\delta }_{m,l}$ everywhere else, such that the first entry of the vector it is applied on, is the probability of the level j after work extraction and the second is the probability of the level k).
  • (vi)  
    Restart with 1. n times in total, redefining $A(j)$ and $A(k)$ as above for the probabilities after this process.
  • (viii)  
    Do a permutation, which brings back the levels j and $k=j+1$ at their position at the beginning (we show below, that this is possible).

Instead of the first four actions, we could have simply said we do extract the work w1 on the level j and the work w2 on the level k. Then we would have had to continue with doing the total mixing also between these levels (instead of at the first and second position of the matrix) and so on. What we mean here with doing a work extraction on the level j is the action: do a permutation bringing the level j in front, extract work, permute the level back.

In later definitions we will make use of this. Here we do not, since the algebra would get slightly more complicated.

The following lemma shows that the above process costs no work with probability 1 and that it can indeed be seen as a shift of the separation between the levels.

Lemma 7 (Action of the isothermal shift of boundary). Let $A(j)=\mathrm{exp}\left(\frac{-E(j)}{{kT}}\right)$, where $E(j)$ is the energy eigenvalue of the jth level.

Let the levels j, $k=j+1$ have the same Gibbs rescaled probability.

After an isothermal shift of the boundary between j and k by $w\in \left(-\frac{A(j)}{A(j)+A(k)},\frac{A(k)}{A(j)+A(k)}\right)$ in direction k:

  • (i)  
    • (a)  
      The energy eigenvalues of all levels but j and k remain constant.
    • (b)  
      At the end ${A}_{f}(j)=\mathrm{exp}\left(\frac{-{E}_{f}(j)}{{kT}}\right)$ is given by ${A}_{f}(j)=A(j)+w(A(j)+A(k))$ and for the level k: ${A}_{f}(k)=A(k)-w(A(j)+A(k))$ (${E}_{f}(j)$ is the energy of the eigenvalue j after the shift).
  • (ii)  
    With probability $1-(\lambda (j)+\lambda (k))$, the occupation probabilities of the final state are given by $\frac{\lambda (l)}{1-(\lambda (j)+\lambda (k))}$ for $l\ne j,k$ and 0 for $l=j,k$.
  • (iii)  
    With probability $\lambda (j)+\lambda (k)$, the occupation probabilities of the final state are given by $\frac{{A}_{f}(l)}{A(j)+A(k)}$ for $l=j,k$ and 0 else.
  • (iv)  
    With probability 1 the energy in the work reservoir is changed by W = 0.

Proof. 1(a) Just follows out of the algorithm, since we did not do any work extraction on any levels and this is the only way we can change energies in our game. For 1.(b) we need to look at how the energy eigenvalues of the jth and kth level change each of the n times one goes through the algorithm in definition 9. directly from the algorithm we get, that in the first time one goes through it $A(j)$ changes to ${A}_{1}(j)=\mathrm{exp}\left(\frac{-E(j)+{kT}\mathrm{ln}({w}_{1})}{{kT}}\right)$ and we get ${A}_{1}(j)={w}_{1}A(j)=A(j)+\frac{w}{n}(A(j)+A(k))$ and by the same argument ${A}_{1}(k)={w}_{2}A(k)=A(k)-$ $\frac{w}{n}(A(j)+A(k))$. Since ${A}_{1}(j)+{A}_{1}(k)=A(j)+A(k)$ we see, that after l times one goes through the algorithm, one ends up with: ${A}_{l}(j)=A(j)+(l-1)\frac{w}{n}(A(j)+A(k))+\frac{w}{n}(A(j)+A(k))=A(j)+l\frac{w}{n}(A(j)+A(k))$ and ${A}_{l}(k)=A(k)-l\frac{w}{n}(A(j)+A(k))$. With l = n we get what is stated in 1 (b).

In order to derive 2 and 3 we need to have a closer look at how the occupation probabilities change each of the n times we go through the algorithm. The occupation probabilities are given by the Gibbs rescaled probabilities multiplied with the corresponding $A(l)$.

Let q be the Gibbs rescaled probability distribution after step 1. of the ith time one goes through the algorithm in definition 9. After step 2 we have:

where $\eta ({q}_{j})={\displaystyle \int }_{0}^{{A}_{j}}q(x){\rm{d}}{\text{}}x$ and $Z(q)$ is the partition function of q.

After step 4 we thus have:

Noting that $q(x)=q(x/{w}_{2})$ for $x\in (0,{A}_{k}]$ and similarly for $x\in ({A}_{j},{A}_{k}+{A}_{j}]$ and $x-{A}_{j}{w}_{1}-{A}_{k}{w}_{2}+{A}_{k}+{A}_{j}=x$, we can rewrite this as:

Which means that after step 5 we get:

For 2 note that with probability $1-(\lambda (j)+\lambda (k))$ we get after the first time one goes through the algorithm: ${q}_{j}={q}_{k}=0$ (which just means, that the state is measured to be orthogonal to j and k). And therefore in the subsequent steps we have $\eta ({q}_{j})=\eta ({q}_{k})=0$. So we get with probability $1-(\lambda (j)+\lambda (k))$, the final probability distribution:

Since the energy eigenvalues of these levels are unchanged, we get $\frac{\lambda (l)}{1-(\lambda (j)+\lambda (k))}$ for $l\ne j,k$ and 0 for $l=j,k$ for the occupation probabilities, which proves 2.

The final Gibbs rescaled probabilities of the levels j and k have the same value (since we completely mix them in step 5). Their integral (${\displaystyle \int }_{0}^{{A}_{j}+{A}_{k}}q(x){\rm{d}}{\text{}}x$), after the first time one goes through the algorithm keeps 1 (with probability $\lambda (j)+\lambda (k)$). As noticed before, ${A}_{f}(j)+{A}_{f}(k)=A(j)+A(k)$. Thus we get that with probability $\lambda (j)+\lambda (k)$ the occupation probabilities of the levels are given by: $\frac{{A}_{f}(l)}{A(j)+A(k)}$ for $l=j,k$ and 0 else. Which proves 3.

Suppose in the first time one goes through the algorithm the state is orthogonal to the levels $j,k$: then the energy in the work reservoir is unchanged throughout the whole n times one goes through the algorithm and for this case, 4 follows trivially.

We now look at the other case (the case where the state is projected onto the levels $j,k$ the first time one goes through the algorithm).

Let $\vec{s}\in \{1,2\}{}^{n}$. Define $\sigma (2)=1$ and $\sigma (1)=-1$. Define ${\alpha }_{1}=\displaystyle \frac{A(j)}{A(j)+A(k)}$ and ${\alpha }_{2}=1-{\alpha }_{1}$. In the lth time one goes through the algorithm one either gets the logarithmical work

or the similarly derivable value for ${w}_{l}(2)$ (Al is defined in the proof of 1(b)). Thus we can write:

In total we get the logarithmical work:

with probability (given, that we have the case where the state is projected onto the levels $j,k$ the first time one goes through the algorithm):

The expectation value of wtot can be computed as follows (for $n\lt \infty $):

We now look at how much the work $W=\mathrm{ln}({w}_{\mathrm{tot}})$ changes, if in step l one replaces sl by ${\hat{s}}_{l}$ (remember that ${s}_{j}=0\;\iff \;{\hat{s}}_{j}=1$ and vice versa):

with $c=w\sigma ({s}_{l})$ (and therefore $w\sigma ({\hat{s}}_{l})=-c$), $a={\alpha }_{{s}_{l}}$ (and ${\alpha }_{{\hat{s}}_{l}}=1-a$), $x=a+c\frac{l}{n}$ and $y=1-a-c\frac{l}{n}$ we get:

Using the McDiarmid inequality (McDiarmid 1989) we get that the probability that W differs from its expectation value is bounded by:

which tends to 0 for any $\delta \gt 0$. Therefore we get that the work in this process is given by 0 with probability 1, which proves 4. □

Corollary. Using the above lemma one can split up any level k into two parts by using an empty level e:

  • (i)  
    Permuting the levels such, that the empty level e comes before the level k.
  • (ii)  
    Doing a work extraction by $\infty $ on the level e (such that its energy is $\infty $, while its width is 0, this costs no work, since the level is empty).
  • (iii)  
    Do an isothermal shift of the level e in direction k by $w\in (0,1)$.

Then by the previous lemma the final overall distribution is the same as the initial, apart from the two levels e and k, which have now occupation probabilities:

and have energies E with $\mathrm{exp}(-{E}_{k}/{kT})=A$:

The corollary directly follows from the lemma. Next we need an algorithm which makes it possible to get the end state σ out of the initial state ρ, if $p\;\succ \;{p}_{f}$ (the generalization of the step $4\to 5$ in the example).

The idea for the algorithm is that we first take the biggest eigenvalues of ρ, such that their area (i.e. the sum of their occupation probabilities) is equal to the biggest occupation probability (${\lambda }_{f}(1)$ of σ). Then we mix them and make a work extraction, such that their total width (i.e. the sum of $\mathrm{exp}(-E(j)/{kT})$) is the same as that of the final energy level 1. then we continue with the second and so forth.

To write down the algorithm, we first need two definitions simplifying the notation:

Definition 10 (Generalized sum). If $c\in {\mathbb{R}}$, $c\geqslant 1$, we define $\displaystyle \sum _{i=1}^{c}{d}_{i}:= \displaystyle \sum _{i=1}^{\lfloor c\rfloor }{d}_{i}+(c-\lfloor c\rfloor ){d}_{\lceil c\rceil }$. If $c\in {\mathbb{R}}$, $0\leqslant c\lt 1$, we define $\displaystyle \sum _{i=1}^{c}{d}_{i}:= c\cdot {d}_{1}$.

(Note that the above definition reduces to the usual sum if $c\in {\mathbb{N}}$).

Definition 11 (Gibbs-equivalent and Gibbs-expanding (see figure C3)). We say two tuples of $(\rho ,{H}_{i})$, $(\sigma ,{H}_{f})$ are Gibbs-equivalent (for a given temperature) if they give rise to the same Gibbs-rescaled distribution (where both are defined, 0 else). A transform is similarly said to be Gibbs-equivalent if it changes a tuple to a Gibbs-equivalent one. Finally a transform is said to be Gibbs-expanding if it changes a tuple $(\rho ,{H}_{i})$ to another one $(\sigma ,{H}_{f})$ with ${G}^{T}(\rho )\;\succ \;{G}^{T}(\sigma )$.

Lemma 8 (Optimal Gibbs-expanding transforms). Let ρ, σ be two states, diagonal in their energy-basis of dimension n.Let ρ and σ have at least $n/2$ empty levels. Let ${G}^{T}(\rho )\;\succ \;{G}^{T}(\sigma )$.

  • Then one can transform ρ into σ with 0 work with probability 1.
  • In other words: optimal Gibbs-expanding transforms exist and yield at least 0 work.

Proof. W.l.o.g. let the levels of ρ and σ be ordered in descending order.

  • Let ${\lambda }_{i(f)}(j)$ denote the jth level of the initial (final) state.
  • Define ${a}_{1}\in {\mathbb{R}}$ as the number of needed levels of ρ s.t. the total area is equal to the area at the end:
    (if ${a}_{1}\notin {\mathbb{N}}$ one needs to split the level $\lceil {a}_{1}\rceil $ as in the above corollary).

Define c as the width of the final first level:

where ${A}_{i(f)}(j)=\mathrm{exp}(-{E}_{i(f)}(j)/{kT})$.

Now we get because of ${G}^{T}(\rho )\;\succ \;{G}^{T}(\sigma )$:

which by ${A}_{f}(1)={\displaystyle \sum }_{j=1}^{c}{A}_{i}(j)$ can be stated as:

therefore: $c\geqslant {a}_{1}$ and finally:

which means that one can change the energy of the first a1 such that it is equal to the energy of the level 1 at the end, with 0 risk at no cost, since either successful or not, the energy gained will be at least 0. The occupation probabilities λ will obviously not be changed by this (apart the total mixing of the first a1 levels). Now we could go on and prove the same for the second level and so forth, but there is an easier way:

The only ingredient we needed for the above reasoning to work was ${G}^{T}(\rho )\;\succ \;{G}^{T}(\sigma )$. But this is equivalent to ${G}^{T}(\rho )-K\;\succ \;{G}^{T}(\sigma )-K$ for any constant K, especially for $K={\lambda }_{f}(1)$. Explicitly:

Remembering ${\lambda }_{f}(1)={\displaystyle \sum }_{j=1}^{{a}_{1}}{\lambda }_{i}(j)={\displaystyle \sum }_{j=1}^{{a}_{1}}{A}_{i}(j)\cdot \left(\displaystyle \frac{{\lambda }_{i}(j)}{{A}_{i}(j)}\right)$ the above can be rewritten as:

i.e. we get the same requirement for the remaining levels. Which means, that we can inductively apply our argument. Since the number of non-empty levels of σ is at most $n/2$ it follows that we need at most $n/2$ empty levels to be able to split all the levels at the right place. □

With this lemma we can now classify the operations which cost 0 work (with risk 0) and their reverse also costs 0 work: these are exactly those which do not change the Gibbs-rescaled probability distribution and are optimal:

From the above lemma it follows that any optimal Gibbs-equivalent transform costs no work. Secondly, if the initial and the final state are Gibbs-equivalent such a transform exists (again by the above lemma), so it is reversible. On the other hand if a transform is not Gibbs-equivalent either it or its reverse cost more than 0 work (by the first part of theorem 1).

As an aside: this, together with the triangle inequality, proves that the symmetrized version of the mixing distance $D(a,b)={\mathsf{M}}(a\parallel b)+{\mathsf{M}}(b\parallel a)\geqslant 0$ is a metric on the set of probability distributions on the positive reals ordered in descending order.

Appendix D.: Entropy increase law

Consider the interaction of the working medium system with the heat bath. Let S be the Von Neumann entropy of the system, β the inverse temperature associated with the bath, and $\langle E\rangle ={\displaystyle \sum }_{i}{\lambda }_{i}{E}_{i}$ the expected internal energy of the system. This section compares the standard law for entropy increase:

Equation (D.1)

with the one we propose should replace it:

Equation (D.2)

D.1. Our model respects standard expression

Lemma 9. In the model for thermalization used here equation D.1 is always respected.

Proof. We firstly recall the model and define certain notation.

Recall that the thermalization model states that when two levels, 1 and 2, are coupled to the heat bath, their ratio ${\lambda }_{1}/{\lambda }_{2}$ gets closer to $\mathrm{exp}(-\beta ({E}_{1}-{E}_{2}))$, and the other λ's are untouched. In our model one may concatenate several such interactions to implement any allowed multi-level interaction with the bath. It will therefore suffice to show that equation (D.1) holds for a single two-level interaction with the heat bath.

For notational convenience let the probability of being in level 1 or 2 be called ${\lambda }_{12}:= {\lambda }_{1}+{\lambda }_{2}$. This is then constant for the given two-level interaction with the bath. In the extreme case of the two levels interacting with the bath for an arbitrary amount of time we have ${\lambda }_{1}:= {\lambda }_{1}^{T}$ and ${\lambda }_{2}:= {\lambda }_{2}^{T}$ (T reminds us of the temperature dependence). These values must then obey the relation

Equation (D.3)

We also assume without loss of generality that ${E}_{2}\leqslant {E}_{1}$. This implies that ${\lambda }_{1}^{T}\leqslant 0.5{\lambda }_{12}$.

Now we begin to prove the statement. Firstly we simplify $\Delta S$ by noting that only two levels change their probabilities. We write

We see that in any two-level interaction

Equation (D.4)

It is helpful to re-express ${S}_{12}$ in terms of an actual entropy $\bar{{S}_{12}}$, so that we can use known properties of entropies to make statements about ${S}_{12}$. We let $\bar{{\lambda }_{1}}:= {\lambda }_{1}/{\lambda }_{12}$ and $\bar{{\lambda }_{2}}:= {\lambda }_{2}/{\lambda }_{12}$ such that $\bar{{\lambda }_{1}}+\bar{{\lambda }_{2}}=1$. We define

One can then see in a few lines of algebra that

It follows that

Equation (D.5)

We accordingly now want to show that ${\lambda }_{12}\Delta \bar{{S}_{12}}\geqslant \beta \Delta \langle E\rangle .$

We can now use a well known property of the Shannon/von Neumann entropy: $\bar{{S}_{12}}$ is concave in $\bar{{\lambda }_{1}}={\lambda }_{1}/{\lambda }_{12}$. The function is accordingly upper bounded by any tangential line, as in figure D1 . Consider the tangential line at ${\lambda }_{1}={\lambda }_{1}^{T}$. At that point it follows from a few lines that

Equation (D.6)

Note now that $\langle E\rangle $ may similarly to the entropy be written as

such that $\Delta \langle E\rangle =\Delta \langle E{\rangle }_{12}=(\Delta {\lambda }_{1})({E}_{1}-{E}_{2})$, with $\Delta {\lambda }_{1}={\lambda }_{1}^{\prime }-{\lambda }_{1}$ the change in ${\lambda }_{1}$. So $\langle E\rangle ({\lambda }_{1})$ is a line with gradient given by

Similarly

Comparing this with the gradient of the tangential line to $\bar{{S}_{12}}$ in equation (D.6), we see that $\displaystyle \frac{1}{{\lambda }_{12}}\beta \langle E{\rangle }_{12}$ has the same gradient as the tangential line. We therefore only need to show that the change in the tangential line is upper bounded by the change in the entropy curve, as it is equivalent to showing that $\Delta \bar{{S}_{12}}\geqslant \frac{1}{{\lambda }_{12}}\beta \langle E{\rangle }_{12}$. This must hold for all possible initial and final values of $\bar{{\lambda }_{1}}$ and all possible values of ${\bar{{\lambda }_{1}}}^{T}$(recall that we assumed without loss of generality that ${\bar{{\lambda }_{1}}}^{T}\geqslant 0.5$ ). These can be grouped into three cases.

  • (i)  
    $\bar{{\lambda }_{1}}\leqslant {\bar{{\lambda }_{1}}}^{T}$. Here the tangential bound above implies that $\Delta \bar{{S}_{12}}\geqslant \frac{\beta }{{\lambda }_{12}}\langle E{\rangle }_{12}\geqslant 0$.
  • (ii)  
    ${{\lambda }_{1}}^{T}\leqslant \bar{{\lambda }_{1}}\leqslant 0.5$. Here the tangential bound implies that $0\geqslant \Delta \bar{{S}_{12}}\geqslant \frac{\beta }{{\lambda }_{12}}\langle E{\rangle }_{12}$.
  • (iii)  
    $\bar{{\lambda }_{1}}\geqslant 0.5$, also after the interaction. Here the tangential bound implies that $\bar{\Delta {S}_{12}}\geqslant 0\geqslant \frac{\beta }{{\lambda }_{12}}\langle E{\rangle }_{12}$.

This implies the lemma. □

Figure D1.

Figure D1. The entropy $\bar{{S}_{12}}$ is a function of $\bar{{\lambda }_{1}}$. The red dot corresponds to the thermal state in question, i.e. $\bar{{\lambda }_{1}}={\bar{{\lambda }_{1}}}^{T}$. The tangential upper bound has gradient $\beta ({E}_{2}-{E}_{1})$.

Standard image High-resolution image

D.2. Evolutions respecting standard expression may violate Kelvin's second law

Recall that our condition on thermalizing evolutions was stronger than equation (D.1). There are, as mentioned in the main body, examples of evolutions that respect equation (D.1) but violate our condition: equation (D.2). In this section we consider whether these evolutions may violate Kelvin's second law: no process is possible in which the sole result is the absorption of heat from a reservoir and its complete conversion into work.

We use standard results concerning majorization, as well as our main theorem. We will consider degenerate energy levels for simplicity so that equation (D.1) reduces to $\Delta S\geqslant 0$. We now only assume that the evolution is represented by a stochastic matrix (which it is if the map is Markovian). We do not assume it is the type of thermalization used hitherto as that would automatically respect equation (D.2).

Lemma 10. Any stochastic matrix A which for some state violates equation (D.2) but respects the entropy condition $\Delta S\geqslant 0$ will for some input state, namely the uniform distribution, violate $\Delta S\geqslant 0$.

Proof. 

  • (i)  
    Equation (D.2) is respected iff the matrix is bistochastic. Thus A is NOT bistochastic.
  • (ii)  
    The uniform distribution is invariant under a stochastic matrix iff it is bistochastic.

Thus A does NOT preserve the uniform distribution. Now the uniform distribution is unique in having maximal von Neumann entropy. Thus $\Delta S\geqslant 0$ is violated if the input state is the uniform distribution.

Lemma 11. Consider a state changing to another one. Suppose: (i) the von Neumann entropy is increased, (ii) equation (D.2) is violated , and (iii) the evolution is a stochastic matrix. Then this evolution–applied to the thermal state–would allow for the violation of Kelvin's second law within our game: deterministic work extraction would be possible from a cycle where the system is in the thermal state both initially and finally.

Proof. Recall that we are for simplicity considering degenerate energy levels in this section. The thermal state is then the uniform distribution. Apply A to this (at no work cost as it represents an interaction with the heat bath). Now we have a state σ other than the uniform distribution, so it must majorize the uniform distribution.

To see that this implies deterministic work extraction we firstly show that ${W}^{0}\gt 0$ for some process using A and allowed operations within the game. Consider taking n copies of σ and going to the von Neumann limit by taking n to infinity as well as taking the risk of failure epsilon to 0. To evaluate ${W}^{\varepsilon }$ in this limit it is convenient to use theorem 12 which re-expresses ${W}^{\varepsilon }$. Recall that in the von Neumann limit the smooth max entropy reduces to the von Neumann entropy S. We therefore have, for the case of degenerate levels:

where we have also used the well-known additivity of both entropies: ${H}_{\mathrm{max}}({\rho }^{\otimes n})={{nH}}_{\mathrm{max}}(\rho )$ and $S({\rho }^{\otimes n})={nS}(\rho )$ In this case $\tau ={\mathbb{1}}/d$, i.e. the maximally mixed state associated with a d-dimensional Hilbert space. Moreover ${H}_{\mathrm{max}}({\mathbb{1}}/d)-S(\sigma )\gt 0$ since the uniform distribution is unique in having maximal von Neumann entropy and ${H}_{\mathrm{max}}\geqslant S$. Thus ${W}^{0}\gt 0$ for that process.

Recall secondly the subtlety that we proved that ${W}^{\varepsilon }(\sigma \to {\sigma }^{\prime })$ is achievable within the game when there is access to a catalyst system. Consider extracting work from n copies of $\sigma \otimes | \xi \rangle \langle \xi | $ which will be set to n copies of ${\mathbb{1}}/d\otimes | \xi \rangle \langle \xi | $ at the end. Now ${H}_{\mathrm{max}}({\mathbb{1}}/d\otimes | \xi \rangle \langle \xi | )-S(\sigma \otimes | \xi \rangle \langle \xi | )\gt 0$ as neither entropy of a state is changed by adding a pure system in this way. Thus including the catalyst system does not change the statement that ${W}^{0}\gt 0$ for the above procedure in the von Neumann limit. Accordingly this process violates Kelvin's law. □

Appendix E.: Recovering the relative min-entropy

We now show that when restricting our main theorem to the appropriate limit we recover the result of equation (2) which, as discussed in the main body, was given in Aberg (2012), Horodecki and Oppenheim (2013). Recall that this statement was

which should hold for the case where the final state ${\rho }_{T}$ is a thermal state on the same energy levels as the initial state σ.

The definition of ${D}_{0}^{\varepsilon }(.\parallel .)$ is as given in Datta (2009) (where it is called ${D}_{\mathrm{min}}$): ${D}_{0}(\rho \parallel \sigma ):= -\mathrm{log}{Tr}({\Pi }_{\rho }\sigma )$, where ${\Pi }_{\rho }$ is the projector onto the support of ρ. The smooth version is defined as ${D}_{0}^{\varepsilon }(\rho \parallel \sigma ):= {\mathrm{sup}}_{\bar{\rho }\in {B}^{\varepsilon }(\rho )}{D}_{0}(\bar{\rho }\parallel \sigma )$, where ${B}^{\varepsilon }(\rho )$ is the set of states within epsilon trace distance of ρ.

One may first consider the special case of degenerate energy levels, as in Dahlsten et al (2011) (recall that it was shown in Aberg (2012) that this is a special case of (2). In this case the final state (even without the Gibbs rescaling) is a uniform distribution with support d at least as large as that of the initial state and taken to physically correspond to the system dimension (for n qubits or bits $d={2}^{n}$). The relative entropy expression becomes in this case

To check that this agrees with the relative mixedness expression note that the 'stretching factor' m where $M(\rho \parallel \sigma )=\mathrm{log}m$ is given by $m=\frac{\parallel \mathrm{supp}(q)\parallel }{\parallel \mathrm{supp}({p}^{\varepsilon })\parallel }$. It follows that the two expressions do indeed agree in this case.

We now consider the case of non-degenerate levels. We begin with deriving the relative mixedness expression for a more general case, where the final state is some thermal state but not necessarily of the same Hamiltonian. Then we specialize to the case where it is of the same Hamiltonian, and show that the relative entropy expression is recovered.

Theorem 12. 

where $p={G}^{T}(\rho )$ is the Gibbs rescaled probability distribution corresponding to the initial state ρ and $q={G}^{T}(\sigma )$ is the one corresponding to the final thermal state σ.

For the proof of this theorem a technical lemma on the smooth max-entropy is needed.

Lemma 13. Let p be a monotonously falling probability function on $[0,\infty )$ and ${d}_{\varepsilon }$ be defined through

Then:

Proof. Let ${d}_{\varepsilon }$ be defined as above. We need to show two things:

  • (i)  
    $\exists {p}^{\varepsilon }$ probability function on $[0,\infty )$ with $\parallel \mathrm{supp}({p}^{\varepsilon })\parallel ={d}_{\varepsilon }$ and trace-distance $\delta (p,{p}^{\varepsilon })\lt \varepsilon $.
  • (ii)  
    $\parallel \mathrm{supp}({p}^{\varepsilon })\parallel \geqslant {d}_{\varepsilon }$ $\forall {p}^{\varepsilon }$ monotonously decreasing probability functions on $[0,\infty )$ with $\delta (p,{p}^{\varepsilon })\lt \varepsilon $.

Then we get that ${H}_{\mathrm{max}}^{\varepsilon }(p)={\mathrm{log}}_{2}\left({\mathrm{min}}_{\delta (p,{p}^{\varepsilon })\lt \varepsilon }(\parallel \mathrm{supp}({p}^{\varepsilon })\parallel )\right)={\mathrm{log}}_{2}({d}_{\varepsilon })$, as said in the lemma. The proof of (i) goes as follows: define ${p}^{\varepsilon }(x)=p(x){\left({\displaystyle \int }_{0}^{{d}_{\varepsilon }}p(x)\right)}^{-1}$ for $x\leqslant {d}_{\varepsilon }$ and ${p}^{\varepsilon }(x)=0$ for $x\gt {d}_{\varepsilon }$. This ${p}^{\varepsilon }$ is therefore normalized to one, has support $[0,{d}_{\varepsilon }]$ and the following equation shows that it is also epsilon-near to p:

which concludes the proof of (i). (ii) Is proven on the next page (for typographical reasons).

For the proof of (ii) assume, that: $\exists {p}^{\varepsilon }$ like above, s.t. $\parallel \mathrm{supp}({p}^{\varepsilon })\parallel \leqslant {d}_{\varepsilon }$, then:

which is a contradiction to $\delta (p,{p}^{\varepsilon })=\frac{1}{2}\left({\displaystyle \int }_{0}^{\infty }\left|{p}^{\varepsilon }(x)-p(x)\right|\right)\lt \varepsilon $. □

Now we have all we need to prove the theorem above:

Proof. let ${p}^{\varepsilon }$ be a probability function with the smallest possible support such that $\delta (p,{p}^{\varepsilon })\leqslant \varepsilon $ and define ${d}_{\varepsilon }$ as in lemma 13. For $l\leqslant {d}_{\varepsilon }$ the requirement for maximal work extraction reads (using the lemma)

The above is an equation in the case $l={d}_{\varepsilon }$. Which shows that the maximal w as defined in theorem 1 is given by

Equation (2) is a special case of the above theorem, recovered when the final state is a Gibbs state and has also the same energy eigenvalues as the initial.

Corollary. Let ρ be a diagonal state with energy eigenvalues Ei and ${\sigma }^{T}$ be the Gibbs state with the same energy eigenvalues Ei at the bath temperature T. Then the maximal extractable work at risk epsilon is given by:

Proof. Let p be the Gibbs-rescaled probability function corresponding to ρ and $P(j)$ the eigenvalues of ρ. Let a be the flat energy probability function corresponding to ${\sigma }^{T}$. Let $A(j)=\frac{\mathrm{exp}\left(\frac{-E(j)}{{kT}}\right)}{Z}$, where $E(j)$ are the energy-eigenvalues of ρ and ${\sigma }^{T}$ and Z is the corresponding partition function. This means by definition, that

and likewise $a(x)=1/Z$ (both defined for $x\in [0,Z]$).

From the above theorem we get:

Appendix F.: Triangle inequality

The logarithmic relative mixedness respects a triangle inequality:

(Triangle inequality).

Lemma 14 Let ρ, σ be states and ${\varepsilon }_{\mathrm{1,2}}\in [0,1)$

Let ${m}_{1}=M\left(\frac{{G}^{T}(\rho )}{{\varepsilon }_{1}}\| {G}^{T}(\tau )\right)$ and ${m}_{2}=M\left(\frac{{G}^{T}(\tau )}{{\varepsilon }_{2}}\| {G}^{T}(\sigma )\right)$.

for all states τ.

proof. Let ρ, τ and σ be states and ${\varepsilon }_{\mathrm{1,2}}\in [0,1)$. Let ${m}_{1}=M\left(\frac{{G}^{T}(\rho )}{{\varepsilon }_{1}}\| {G}^{T}(\tau )\right)$ and ${m}_{2}=M\left(\frac{{G}^{T}(\tau )}{{\varepsilon }_{2}}\| {G}^{T}(\sigma )\right)$. Let $p={G}^{T}(\rho )$, $q={G}^{T}(\sigma )$ and $s={G}^{T}(\tau )$.

Therefore there is a $m\geqslant {m}_{1}{m}_{2}$ such that

It follows:

Appendix G.: Relative mixedness as entanglement measure

We want to start with any finite dimensional bipartite pure state ${\rho }_{{AB}}$ tensor a pure entangled state of dimension Mi and end up in any finite dimensional bipartite pure state σ tensor a pure entangled state of dimension Mf under LOCC. For ${M}^{i}={2}^{{m}_{i}}$ and ${M}^{f}={2}^{{m}_{f}}$, these additional states can be thought of consisting of mi (mf) Bell states. The question is now, how many initial and final Bell states one needs to do such an operation.

Since the states are finite dimensional we can write them in the Schmidt decomposition (see e.g. Nielsen and Chuang 2000):

By Nielsen (1999) the sufficient and necessary condition for this action being possible is:

Equation (G.1)

Defining:

such that ${\displaystyle \int }_{0}^{l}p(x){\rm{d}}{\text{}}x={\sum }_{j=1}^{l}{P}_{j}$ (and defining q alike), we get that $\tilde{Q}\;\succ \;\tilde{P}$ exactly if

i.e. the operation is possible iff $\frac{{M}^{f}}{{M}^{i}}\leqslant M(q| | p)$.

Thus the number of Bell states needed to do such an operation is given by ${\mathrm{log}}_{2}(\frac{{M}^{f}}{{M}^{i}})\leqslant {\mathrm{log}}_{2}(M(q| | p))$.

It is not hard to show that the relative mixedness of entanglement is an entanglement monotone. This entanglement measure will be investigated in more detail elsewhere.

Footnotes

  • Stochastic matrices have entries in $[0,1]$ with columns summing to 1, therefore they map probability vectors to probability vectors.

Please wait… references are loading.