Paper The following article is Open access

Two-qubit causal structures and the geometry of positive qubit-maps

and

Published 9 August 2018 © 2018 The Author(s). Published by IOP Publishing Ltd on behalf of Deutsche Physikalische Gesellschaft
, , Citation Jonas M Kübler and Daniel Braun 2018 New J. Phys. 20 083015 DOI 10.1088/1367-2630/aad612

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1367-2630/20/8/083015

Abstract

We study quantum causal inference in a setup proposed by Ried et al (2015 Nat. Phys. 11 414) in which a common cause scenario can be mixed with a cause–effect scenario, and for which it was found that quantum mechanics can bring an advantage in distinguishing the two scenarios: whereas in classical statistics, interventions such as randomized trials are needed, a quantum observational scheme can be enough to detect the causal structure if the common cause results from a maximally entangled state. We analyze this setup in terms of the geometry of unital positive but not completely positive qubit-maps, arising from the mixture of qubit channels and steering maps. We find the range of mixing parameters that can generate given correlations, and prove a quantum advantage in a more general setup, allowing arbitrary unital channels and initial states with fully mixed reduced states. This is achieved by establishing new bounds on signed singular values of sums of matrices. Based on the geometry, we quantify and identify the origin of the quantum advantage depending on the observed correlations, and discuss how additional constraints can lead to a unique solution of the problem.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Imagine a scenario where two experimenters, Alice and Bob, sit in two distinct laboratories. At one point Alice opens the door of her laboratory, obtains a coin, checks whether it shows heads or tails and puts it back out of the laboratory. Some time later also Bob obtains a coin and also he checks whether it shows heads or tails. This experiment is repeated many times (ideally: infinitely many times) and after this they meet and analyze their joint outcomes. Assuming their joint probability distribution entails correlations, there must be some underlying causal mechanism which causally connects their coins [1]. This could be an unobserved confounder (acting as a common cause), and they actually measured two distinct coins influenced by the confounder. Or it could be that Alice's coin was propagated by some mechanism to Bob's laboratory, and hence they actually measured the same coin, with the consequence that manipulations of the coin by Alice can directly influence Bob's result (cause–effect scenario). The task of Alice and Bob is to determine the underlying causal structure, i.e. to distinguish the two scenarios. This would be rather easy if Alice could prepare her coin after the observation by her choice and then check whether this influences the joint probability (so-called 'interventionist scheme'). In the present scenario, however, we assume that this is not allowed (so-called 'observational scheme'). All that Alice and Bob have are therefore the given correlations, and from those alone, in general they cannot solve this task without additional assumptions. Ried et al [2] showed that in a similar quantum scenario involving qubits the above task can actually be accomplished in certain cases even in an observational scheme (see below for a discussion of how the idea of an observational scheme can be generalized to quantum mechanics).

In the present work we consider the same setup as in [2], and allow arbitrary convex combinations of the two scenarios: the common cause scenario is realized with probability p, the cause–effect scenario with probability 1 − p. Our main result are statements about the ranges of the parameter p for which observed correlations can be explained with either one of the scenarios, or both. For this, we cast the problem in the language of affine representations of unital positive qubit maps [3] in which all the information is encoded in a 3 × 3 real matrix, as is standard in quantum information theory for completely positive (CP) unital qubit maps [4].

The paper is structured as follows: in section 2 we introduce causal models for classical random variables and for quantum systems. Therein we define what we consider a quantum observational scheme. Section 3 introduces the mathematical framework of ellipsoidal representations of qubit quantum channels and qubit steering maps. In section 4 we define our problem mathematically and prove the main results, which we then comment in the last section 5.

2. Causal inference: classical versus quantum

2.1. Classical causal inference

At the heart of a classical causal model is a set of random variables X1, X2, ..., XN. The observation of a specific value of a variable, Xi = xi, is associated with an event. Correlations between events hint at some kind of causal mechanism that links the events [1] . Such a mechanism can be a deterministic law as for example xi = f(xj) or can be a probabilistic process described by conditional probabilities $P({x}_{i}| {x}_{j})$, i.e. the probability to find Xi = xi given Xj = xj was observed. The causal mechanism may not be merely a direct causal influence from one observed event on the other, but may be due to common causes that lead with a certain probability to both events—or a mixture between both scenarios. Hence, by merely analyzing correlations P(x1, x2, ..., xn), i.e. the joint probability distribution of all events, one can, in general, without prior knowledge of the data generating process, not uniquely determine the causal mechanism that leads to the observed correlations (purely observational scheme). To remedy this, an intervention is often necessary, where the value of a variable Xi whose causal influence one wants to investigate, is set by an experimentalist to different values, trying to see whether this changes the statistics of the remaining events (interventionist scheme). One strategy for reducing the influence of other, unknown factors, is to randomize the samples. This is for example a typical approach in clinical studies, where one group of randomly selected probands receives a treatment whose efficiency one wants to investigate, and a randomly selected control group receives a placebo. If the percentage of cured people in the first group is significantly larger than in the second group, one can believe in a positive causal effect of the treatment. The probabilities obtained in this interventionist scheme are so-called 'do-probabilities' (or 'causal conditional probabilities') [5]: $P({x}_{i}| \mathrm{do}({x}_{j}))$ is the probability to find Xi = xi if an experimentalist intervened and set the value of Xj to the value xj. This is different from $P({x}_{i}| {x}_{j})$, as a possible causal influence from some other unknown event on Xj = xj is cut, i.e. one deliberately modifies the underlying causal structure for better understanding a part of it. If Xj = xj was the only direct cause of Xi = xi then $P({x}_{i}| {x}_{j})=P({x}_{i}| \mathrm{do}({x}_{j}))$. If instead the event Xi = xi was a cause of Xj = xj, then intervening on Xj cannot change Xi: $P({x}_{i})=P({x}_{i}| \mathrm{do}({x}_{j}))=P({x}_{i}| \mathrm{do}(\bar{{x}_{j}}))$, where $\bar{{x}_{j}}$ is a value different from xj. If the correlation between Xi = xi and Xj = xj is purely because of a common cause, then no intervenion on Xi or Xj will change the probability to find a given value of the other: $P({x}_{i})=P({x}_{i}| \mathrm{do}({x}_{j}))$ for all xj, and $P({x}_{j})=P({x}_{j}| \mathrm{do}({x}_{i}))$ for all xi. Observing these do-probabilities one can hence draw conclusions about the causal influences behind the correlations observed in the occurence of Xi = xi and Xj = xj.

In practice, direct causation in one direction is often excluded by time-ordering and need not to be investigated. For example, when doubting that one can conclude that smoking causes lung cancer from the observed correlations between these two events, it does not make sense to claim that having lung cancer causes smoking, as usually smoking comes before developing lung cancer. But even dividing a large number of people randomly into two groups and forcing one of them to smoke and the other not to smoke in order to find out if there is a common cause for both would be ethically inacceptable. The needed do-probabilities can therefore not always be obtained by experiment. Interestingly, the causal-probability calculus allows one in certain cases, depending notably on the graph structure, to calculate do-probabilities from observed correlations without having to do the intervention. Inversely, apart from only predicting the conditional probabilities for a random variable say Xi given the observation of Xj = xj, denoted as $P({x}_{i}| {x}_{j})$, a causal model can also predict the do-probabilities, i.e. the distribution of Xi if one would intervene on the variable Xj and set its value to xj. This is crucial for deriving informed recommendations for actions targeted at modifying certain probabilities, e.g. recommending not to smoke in order to reduce the risk for cancer.

The structure of a causal model can be depicted by a graph. Each random variable is represented by a vertex of the graph. Causal connections are represented by directed arrows and imply that signaling along the direction of the arrow is possible. In a classical causal model it is assumed that events happen at specific points in space and time, therefore bidirectional signaling is not possible as it would imply signaling backward in time. Hence the graph cannot contain cycles and is therefore a directed acyclic graph (DAG) [5], see figure 1. The set of parents PAj of the random variable Xj is defined as the set of all variables that have an immediate arrow pointing towards Xj, and paj denotes a possible value of PAj. The causal model is then defined through its graph with random variables Xi at its vertices and the weights $P({x}_{j}| {{pa}}_{j})$ of each edge, i.e. the probabilities that Xj = xj happens under the condition that Paj = paj occurred. The model generates the entire correlation function according to

Equation (1)

which is referred to as causal Markov condition [5]. When all P(x1, ..., xn) are given, then all conditional probabilities follow, hence all $P({x}_{j}| {{pa}}_{j})$ that appear in a given graph, but in general not all correlations nor all $P({x}_{j}| {{pa}}_{j})$ are known (see below). The causal inference problem consists in finding a graph structure that allows one to satisfy equation (1) for given data P(x1, ..., xn) and all known $P({x}_{j}| {{pa}}_{j})$, where the unknown $P({x}_{j}| {{pa}}_{j})$ can be considered fit parameters in case of incomplete data. With access to the full joint probability distribution, the causal inference only needs to determine the graph. In practice, however, one often has only incomplete data: as long as a common cause has not been determined yet, one will not have data involving correlations of the corresponding variable. For example, one may have strong correlations between getting lung cancer (random variable X2 $\in $ {0, 1}) and smoking (random variable X1 ∈ {0, 1}), but if there is a unknown common cause X0 for both, one typically has no information about P(x0, x1, x2): one will only start collecting data about correlations between the presence of a certain gene, say, and the habit of smoking or developing lung cancer once one suspects that gene to be a cause for at least one of these. In this case $P({x}_{1}| {x}_{0})$ and $P({x}_{2}| {x}_{0})$ are fit parameters to the model as well. The possibility of extending a causal model through inclusion of unknown random variables is one reason why in general there is no unique solution to the causal inference problem based on correlations alone. Interventions on Xi make it possible, on the other hand, to cut Xi from its parents and hence eliminate unknown causes one by one for all random variables.

Figure 1.

Figure 1. Simple DAG in a four party scenario. The parental structure is ${{PA}}_{A}=\{\},\,{{PA}}_{B}=\{A\},\,{{PA}}_{C}=\{A,B\},\,{{PA}}_{D}=\{C\}$. According to the causal Markov condition, equation (2), the probability distribution then factorizes as $P(a,b,c,d| {i}_{A},{i}_{B},{i}_{C},{i}_{D})\,=\,P(d| c,{i}_{D})P(c| a,b,{i}_{C})P(b| a,{i}_{B})P(a| {i}_{A})$.

Standard image High-resolution image

Once a causal model is known, one can calculate all distributions

Equation (2)

for all possible combinations of interventions and observations, where the ij are the values of the intervention variable Ij for the event Xj, ij = idle or ij = do(xj). Here, $P({x}_{j}| {{pa}}_{j},{i}_{j}=\mathrm{do}({\tilde{x}}_{j}))={\delta }_{{x}_{j},{\tilde{x}}_{j}}$ reflects that an intervention on Xj deterministically sets its value, independently of the observed values of its causal parents. If Ij = idle then the value of Xj only depends on its causal parents PAj, i.e. $P({x}_{j}| {\{{x}_{i}\}}_{i\ne j},{i}_{j}=\mathrm{idle})=P({x}_{j}| {{pa}}_{j},{i}_{j}=\mathrm{idle})$.

The field of causal discovery or causal inference aims at providing methods to determine the causal model, that is the DAG and the joint probability distributions entering (1) for a given scenario. Different combinations of the Ij correspond to different strategies. If all the interventions are set to idle, and hence all the outcomes are determined by the causal parents, one has the purely observational approach. In multivariate scenarios, where more than two random variables are involved, the observation of the joint probability distribution alone can still contain hints of the causal structure based on conditional independencies [5]. Nevertheless, in the bivariate scenario, i.e. when only two random variables are involved, classical correlations obtained by observations do not comprise any causal information. Only if assumptions for example on the noise distribution are taken a priori, information on the causal model can be obtained from observational data [6].

2.2. Quantum causal inference

The notion of causal models does not easily translate to quantum mechanics. The main problem is that in quantum systems not all observables can have predefined values independent of observation. Similiar to an operational formulation of quantum mechanics [7], the process matrix formalism was introduced [8] and a quantum version of an event defined. In [9] this is reviewed for the purpose of causal models. In place of the random variables in the classical case there are local laboratories. Within a process each laboratory obtains a quantum system as input and produces a quantum system as output. A quantum event corresponds to information which is obtained within a laboratory and is associated with a CP map mapping the input Hilbert space to the output Hilbert space of the laboratory. The possible events depend on the choice of instrument. An instrument is a set of CP maps that sum to a CP trace preserving (CPTP) map. For example an instrument can be a projective measurement in a specific basis, with the events the possible outcomes. The possibility to choose different instruments mirrors the possibility of interventions in the classical case [9, 3.3]. The whole information about mechanisms, which are represented as CPTP maps, and the causal connections is contained in a so-called process matrix. Besides its analogy for a classical causal model, the process framework goes beyond classical causal structures as it does not assume such a fixed causal structure [8]. This recently stirred a lot of research [1013]. For a more detailed introduction we refer the reader especially to [9] where a comprehensive description is provided.

The analog of causal inference in the classical case is the reconstruction of a process matrix. This can be done using informationally complete sets of instruments, theoretically described in [9, 4.1] and experimentally implemented in [2]. Defining a quantum observational scheme in analogy to the classical one is not straight forward. In general a quantum measurement destroys much of the states' character and hence can almost never be considered a passive observation. For example if the system was initially in a pure state $| \psi \rangle $ but one measures in a basis such that $| \psi \rangle $ is not an eigenstate of the projectors onto the basis states, then the measurement truly changes the state of the system and the original state is not reproduced in the statistical average. In [9, sect. 5] an observational scheme is simply defined as projective measurements in a fixed basis, in particular without assumptions about the incoming state of a laboratory and thus without assumptions about the underlying process. Another possibility to define an observational scheme is based on the idea that in the classical world observations reveal pre-existing properties of physical systems and that quantum observations should reproduce this. As a consequence, if one mixes the post-measurement states with the probabilities of the corresponding measurement outcomes, one should obtain the same state as before the measurement. That is ensured if and only if operations that do not destroy the quantum character of the state are allowed, as coherences cannot be restored by averaging. Ried et al [2] formalized this notion as 'informational symmetry', but considered only preservation of local states. For the special case of locally completely mixed states, they showed that projective measurements in arbitrary bases possess informational symmetry. This definition of a quantum observational scheme is problematic due to two reasons: firstly, the allowed class of instruments depends on the incoming state, i.e. one can only apply projective measurements that are diagonal in the same basis as the state itself. This is at variance with the typical motivation for an observational scheme, namely that the instruments are restricted a priori due to practical reasons. Moreover, having measurements depend on the state requires prior knowledge about the state of the system, but finding out the state of the system is part of the causal inference (e.g.: are the correlations based on a state shared by Alice and Bob?). Hence, in general one cannot assume sufficient knowledge of the state for restricting the measurements such that they do not destroy coherences.

Secondly, the definition is unnaturally restrictive as it only considers the local state and not the global state. For example if Alice and Bob share a singlet state $| \psi \rangle =\tfrac{| 01\rangle -| 10\rangle }{\sqrt{2}}$, then both local states are completely mixed. Hence according to the informational symmetry, they are allowed to perform projective measurements in arbitrary bases. If Alice and Bob now both measure in the computational basis, they will each obtain both outcomes with probability 1/2 and their local states will remain invariant in the statistical average ${\rho }_{A}^{{\prime} }={\rho }_{A}=\tfrac{{\mathbb{1}}}{2}={\rho }_{B}^{{\prime} }={\rho }_{B}$. However, the global state does not remain intact. The post-measurement state is given as ${\rho }_{{AB}}^{{\prime} }=\tfrac{1}{2}(| 01\rangle \langle 01| +| 10\rangle \langle 10| )$ which is not even entangled anymore. But even defining a 'global informational symmetry', i.e. requiring the global state to remain invariant, does not settle the issue in a convenient way, as this would not allow any local measurements of Alice and Bob.

Here we propose three different schemes ranging from full quantum interventions over a quantum observational scheme with the possibility of an active choice of measurements, to a passive quantum observational scheme in a fixed basis that comes closest to the classical observational scheme, see table 1.

Table 1.  Quantum schemes for causal inference: an overview of instruments allowed within different quantum schemes defined in this section. $\surd $ indicates allowed/possible, X indicates not allowed/impossible.

  Arbitrary instruments Arbitrary projections Fixed basis projection Signaling Causal inference
Q-interventionist $\surd $ $\surd $ $\surd $ $\surd $ $\surd $
Active Q-observational X $\surd $ $\surd $ ${\surd }^{}$ a ${(\surd )}^{}$ b
Passive Q-observational X X $\surd $ X ${{\rm{X}}}^{}$ c

aIn the active quantum observational scheme signaling is possible in principle. However, in the scenarios considered in this work signaling is not possible, and still causal inference can be successful. bThe potential of causal inference in the active quantum-observational scheme is discussed in the main part of this paper. cIn the passive quantum-observational scheme no more causal inference than classical is possible.

The definitions are based on restricting the allowed set of instruments. An instrument is to be understood in the process matrix context. In all three schemes the set of allowed instruments is independent of the actual underlying processes, which is a reasonable assumption, since the motivation for causal inference comes from the fact that states or processes are not known in the first place.

  • Quantum interventionist scheme: Arbitrary instruments can be applied in local laboratories. These include for example deterministic operations such as state preparations or simply projective measurements. An appropriate choice of the instruments enables one to detect causal structure in arbitrary scenarios, i.e. to reconstruct the process matrix [9]. This scheme resembles most closely an interventionist scheme in a classical scenario but offers additional quantum-mechanical possibilities of intervention.
  • Active quantum observational scheme: Only projective measurements in arbitrary orthogonal bases are allowed, but no post-processing of the state after the measurement. The latter request translates the idea of not intervening in the quantum realm, as it is not possible to deterministically change the state by the experimenters choice. Depending on the state and the instrument, the state may change during the measurement, hence the scheme is invasive, but the difference to the classical observational scheme arises solely from the possible destruction of quantum coherences. This is a quantum effect without classical correspondence and hence opens up a new possibility of defining an observational scheme that has no classical analog. Repetitive application of the same measurement within a single run always gives the same output. Furthermore, we allow projective measurements in different bases in different runs of the experiment. This freedom allows one to completely characterize the incoming state. This scheme allows for signaling, i.e. there exist processes for which Alice's choice of instrument changes the statistics that Bob observes. As an example consider the process, where Alice always obtains a qubit in the state $| 1\rangle $. She applies her instrument on it, and then the outcome is propagated to Bob by the identity channel. Bob measures in the basis where $| 1\rangle $ is an eigenstate. If Alice measured in the same basis as Bob, then both of them deterministically obtain 1 as result. If Alice instead measures in the basis $\left\{\left|\pm ,\rangle ,=,\tfrac{1}{\sqrt{2}},(,| ,0,\rangle ,\pm ,| ,1,\rangle \right)\right\}$, then Bob would obtain 1 only with probability $\tfrac{1}{2}$. This is considered as signaling according to the definition in [9]. Clearly, signaling presents a direct quantum advantage for causal inference compared to a classical observational scheme, and motivates the attribute 'active' of the scheme. In the present work we focus on this scheme, but exclude such a direct quantum advantage by considering exclusively unital channels and a completely mixed incoming state for Alice, as was done also in [2]. It is then impossible for Alice to send a signal to Bob if her instruments are restricted to quantum observations, even if she is allowed to actively set her measurement basis. One might wonder whether the quantum observational scheme can be generalized to POVM measurements. However, these do not fit into the framework of instruments that transmit an input state to an output state, as POVM measurements do not specify the post-measurement state.
  • Passive quantum observational scheme: For the whole setup a fixed basis is selected. Only projective measurements with respect to this basis are permitted, and it is forbidden to change the basis in different runs of the experiment. This is also what is used in [9] to obtain classical causal models as a limit of quantum causal models. Since the basis is fixed independently of the underlying process, the measurement can still be invasive in the sense that it can destroy coherences, and hence it is still not a pure observational scheme in the classical sense. Nevertheless, Alice cannot signal to Bob here as she has no possibility of actively encoding information in the quantum state, regardless of the nature of the state, which motivates the name 'passive quantum observational scheme'. As without any change of basis it is impossible to exploit stronger-than-classical quantum correlations, this scheme comes closest to a classical observational scheme. And due to the restriction to observing at most classical correlations, it is not possible to infer anything more about the causal structure than classically possible.

3. Affine representation of quantum channels and steering maps

In this section we introduce the tools of quantum information theory that we need to analyze the problem of causal inference in section 4.

3.1. Bloch sphere representation of qubits

A qubit is a quantum system with a two-dimensional Hilbert space with basis states denoted as $| 0\rangle $ and $| 1\rangle $. An arbitrary state of the qubit is described by a density operator ρ, a positive linear operator with unit trace, ρ ≥0, $\mathrm{tr}[\rho ]$=1. Every single qubit state can be represented geometrically by its Bloch vector ${\boldsymbol{r}}=\mathrm{tr}[\rho {\boldsymbol{\sigma }}]$, with $| {\boldsymbol{r}}| \leqslant 1$ as

Equation (3)

where ${\boldsymbol{\sigma }}={({\sigma }_{1},{\sigma }_{2},{\sigma }_{3})}^{T}$ denotes the vector of Pauli matrices.

3.2. Channels

A quantum channel ${ \mathcal E }$ is a CPTP map. A quantum channel maps a density operator in the space of linear operators $\rho \in { \mathcal L }({ \mathcal H })$ on the Hilbert space ${ \mathcal H }$ to a density operator in the space of linear operators $\rho ^{\prime} \in { \mathcal L }({ \mathcal H }^{\prime} )$ on a (potentially different) Hilbert space ${ \mathcal H }^{\prime} $.

This formalism describes any physical dynamics of a quantum system. Every quantum channel can be understood as the unitary evolution of the system coupled to an environment [4]. The constraint of complete positivity can be understood the following way. If we extend the map ${ \mathcal E }$ with the identity operation of arbitrary dimension, the composed map ${ \mathcal E }\otimes {\mathbb{1}}$, which acts on a larger system, should still be positive. An example of a map that is positive but not CP is the transposition map, that, if extended to a larger system, maps entangled states to operators that can be non-positive-semi-definite [3, chapter 11.1].

3.2.1. Geometrical representation of qubit maps

Every qubit channel (a quantum channel mapping a qubit state onto a qubit state) ${ \mathcal E }$ can be described completely by its action on the Bloch sphere, see [1416] and is completely described by the matrix ${{\rm{\Theta }}}_{{ \mathcal E }}$ mapping the 4D Bloch vector $(1,{\boldsymbol{r}})$

Equation (4)

where the upper row ensures trace preservation. A state ρ described by its Bloch vector ${\boldsymbol{r}}$ is then mapped by the quantum channel ${ \mathcal E }$ to the new state $\rho ^{\prime} $ with Bloch vector

A qubit channel is called unital if it leaves the completely mixed state invariant: ${ \mathcal E }({\rho }_{\mathrm{mixed}})={\rho }_{\mathrm{mixed}}$, with ${\rho }_{\mathrm{mixed}}=\tfrac{{\mathbb{1}}}{2}$, i.e. ${{\boldsymbol{r}}}_{\mathrm{mixed}}={\bf{0}}$. For unital channels ${{\boldsymbol{t}}}_{{ \mathcal E }}$ vanishes. The whole information is then contained in the 3 × 3 real matrix ${T}_{{ \mathcal E }}$, which we refer to as correlation matrix of the channel. The matrix T (from now on we drop the index ${ \mathcal E }$) can be expressed by writing it in its signed singular value (SSV) decomposition [15, equation (9)], [3, equation (10.78)] (see also the appendix around equation (46)),

Equation (5)

Here, R1 and R2 are proper rotations (elements of the SO(3) group), corresponding to unitary channels, that is ${R}_{i}{R}_{i}^{T}={\mathbb{1}}$ with $\det ({R}_{i})=1$, and $\eta =\mathrm{diag}({\eta }_{1},{\eta }_{2},{\eta }_{3})$ is a real diagonal matrix. This can be interpreted rather easily. A unital qubit channel maps the Bloch sphere onto an ellipsoid, centered around the origin, that fits inside the Bloch sphere. First the Bloch sphere is rotated by R2 than it is compressed along the coordinate axis by factors ${\eta }_{i}$. The resulting ellipsoid is then again rotated. Hence, apart from unitary freedom in the input and output, the unital quantum channel is completely characterized by its SSV [15, II.B]. The CPTP property gives restrictions to the allowed values of ${\boldsymbol{\eta }}\equiv {({\eta }_{1},{\eta }_{2},{\eta }_{3})}^{T}$. These are commonly known as the Fujiwara–Algoet conditions [1416]

Equation (6)

The allowed values for ${\boldsymbol{\eta }}$ lie inside a tetrahedron ${{ \mathcal T }}_{\mathrm{CP}}$ (the index CP stands for completely positive),

Equation (7)

where $\mathrm{Conv}({\{{x}_{i}\}}_{i})\equiv \{{\sum }_{i}{p}_{i}{x}_{i}| {p}_{i}\geqslant 0,{\sum }_{i}{p}_{i}=1\}$ denotes the convex hull of the set ${\{{x}_{i}\}}_{i}$ and the vertices are defined as

Equation (8)

For a more detailed discussion of qubit maps we refer the reader to chapter 10.7 of [3].

3.3. Steering

In quantum mechanics, measurement outcomes on two spatially separated partitions of a composed quantum system can be highly correlated [17], and further the choice of measurement operator on one side can strongly influence or even determine the state on the other side [18], a phenomenon known as 'steering'. In the literature, steering is in general considered as a map from Alice's measurement settings to Bob's reduced state after Alice's measurement. In order to describe the different causal scenarios we are interested in here on an equal footing, it is more convenient, however, to consider steering as a positive linear trace preserving map ${ \mathcal S }\,:{\rho }_{A}\mapsto { \mathcal S }({\rho }_{A})={\rho }_{B}$, called steering map, which maps Alice's post-measurement state ρA to the local state of Bob ρB (after Alice measured). In the case of projective measurements, the two notions coincide. For a joint state ρAB the steering map can be given as [19]

Equation (9)

with the conditional state ${\rho }_{B| A}={\rho }_{{AB}}\star {\varrho }_{A}^{-1}$, Alice's reduced state before the measurement ${\varrho }_{A}={\mathrm{tr}}_{B}[{\rho }_{{AB}}]$, and the star product defined as ${\rho }_{{AB}}\star {\varrho }_{A}^{-1}=({\varrho }_{A}^{-1/2}\otimes {{\mathbb{1}}}_{B}){\rho }_{{AB}}({\varrho }_{A}^{-1/2}\otimes {{\mathbb{1}}}_{B})$ [19, p.3]. This formula becomes particularly easy for the case of a maximally entangled two qubit state ρAB. Here the marginals are completely mixed, hence equation (9) reduces to

Equation (10)

Steering maps have been intensely studied especially in terms of entanglement characterization [20, 21]. In analogy to the treatment of qubit channels, we can associate an unique ellipsoid inside the Bloch sphere with a two qubit state, known as steering ellipsoid, that encodes all the information about the bipartite state [20].

Every bipartite two qubit state can be expanded in the Pauli basis as

where

Equation (11)

Note that we defined Θ to be the transposed of the one defined in [20], since we want to treat steering from Alice to Bob. The matrix contains all the information about the bipartite state and can be written as

where ${\boldsymbol{a}}$ (${\boldsymbol{b}}$) denotes the Bloch vector of Alice's (Bob's) reduced state. ${T}_{{ \mathcal S }}$ is a 3 × 3 real orthogonal matrix and encodes all the information about the correlations, and we will refer to it as correlation matrix of the steering map.

In this work we only consider bipartite qubit states which have completely mixed reduced states ${\mathrm{tr}}_{A}[{\rho }_{{AB}}]={\mathrm{tr}}_{B}[{\rho }_{{AB}}]={\mathbb{1}}/2$ or equivalently ${\boldsymbol{a}}={\boldsymbol{b}}={\bf{0}}$. In analogy to unital channels we call such states unital two qubit states and the corresponding maps unital steering maps. Up to local unitary operations on the two partitions, the correlation matrix ${T}_{{ \mathcal S }}$ is characterized by its SSV η1, η2, η3. The allowed values of these are given through the positivity constraint on the density operator ρAB defined up to local unitaries as (see equation (6) in [21])

Equation (12)

The positivity of ρAB implies the conditions (the derivation is analog to the derivation of (10)–(15) in [15])

Equation (13)

These are the same as for unital qubit channels (equation (6)) up to a sign flip, and define the tetrahedron ${{ \mathcal T }}_{\mathrm{CcP}}$ of unital completely co-positive trace preserving maps (CcPTP) [3, 15]

Equation (14)

with the vertices

Equation (15)

CcPTP maps are exactly CPTP maps with a preceding transposition map, i.e. for every steering map ${ \mathcal S }$ there exists a quantum channel ${ \mathcal E }$ such that ${ \mathcal S }={ \mathcal E }\circ { \mathcal T }$, where ${ \mathcal T }$ is the transposition map with respect to an arbitrary but fixed basis (see e.g. [3]).

3.4. Positive maps

We have seen that a quantum channel is a CPTP map and that a steering map is a CcPTP map. Both of them are necessarily positive maps. But are there positive maps that are neither CcP nor CP? Or are there maps that are even both? This issue is nicely worked out in [3, chapter 11]. We shortly review this for unital qubit maps. Since we still deal with linear maps, it is straight forward that also every unital positive one qubit map can be described by a 3 × 3 correlation matrix. Hence we can also analyze its SSV. The allowed SSV are inside the cube ${ \mathcal C }$ defined by [3, FIG. 11.3]

Equation (16)

This is illustrated in figure 2. Note again that we only treat unital maps.

Figure 2.

Figure 2. Geometry of positive maps: for positive trace preserving single qubit maps, the allowed signed singular values lie within a cube ${ \mathcal C }$ defined in (16). Quantum channels corresponding to CPTP maps lie within the blue tetrahedron ${{ \mathcal T }}_{\mathrm{CP}}$ defined in (7), steering maps corresponding to CcPTP maps lie within the yellow tetrahedron ${{ \mathcal T }}_{\mathrm{CcP}}$ defined in (14). The maps with SSV inside the intersection of ${{ \mathcal T }}_{\mathrm{CP}}$ and ${{ \mathcal T }}_{\mathrm{CcP}}$ (green octahedron) are called super positive. These maps only produce classical correlations corresponding to separable states or entanglement breaking channels, but can also be generated by mixtures of quantum correlations.

Standard image High-resolution image

We see that there are positive maps which are neither CP nor CcP. According to the StørmerWoronowicz theorem (see e.g. [3, p. 258]) every positive qubit map is decomposable, i.e. it can be written as a convex combination of a CP and a CcP map. Maps that are both CP and CcP are called super positive (SP). The set of allowed SSV of the correlation matrices of these maps forms an octahedron (green region in figure 2) given as

Equation (17)

where ${\hat{e}}_{i}$ denotes the unit vector along the i-axis. These correlations are generated by entanglement breaking quantum channels [22] and steering maps based on separable states [20]. When such classical correlations are observed one cannot infer anything about the causal structure [2, p.10 of supplementary information].

Figure 3.

Figure 3. DAG: the DAGs of our setting. On the left side with probability (1 − p) a quantum channel ${ \mathcal E }$ is realized, causing correlations between Alice (A) and Bob (B). On the right side, occurring with probability p, the correlations are caused by an unobserved source C that outputs the state ρAB generating correlations through the steering map ${ \mathcal S }$.

Standard image High-resolution image

For higher dimensional systems things change. Already for three dimensional maps, i.e. qutrit maps, there exist positive maps, that cannot be represented as a convex combination of a CP and a CcP map [3], chapter 11.1. In the next section we discuss how much information about causal influences we can obtain by looking only at the SSV related to the correlations Alice and Bob can observe in a bipartite experiment.

Figure 4.

Figure 4. Signed singular values of p-causal maps: set of attainable vectors of signed singular values associated with ${ \mathcal M }$ in (19) for different values of p. By theorem 4.1, for fixed p there exists a CPTP map ${ \mathcal E }$ and a CcPTP map ${ \mathcal S }$ such that ${ \mathcal M }$ is given by (19) if and only if the vector of signed singular values ${{\boldsymbol{\eta }}}^{{ \mathcal M }}$ of the correlation matrix of ${ \mathcal M }$ is in ${{ \mathcal C }}_{p}$ defined in (22).

Standard image High-resolution image

4. Causal explanation of unital positive maps

4.1. Setting

We now tackle the problem of causal inference in the two qubit scenario [2]. The setting is as follows. An experimenter, Alice, sits in her laboratory. She opens her door just long enough to obtain a qubit in a (locally) completely mixed state and closes the door again. She performs an projective measurement in any of the Pauli eigenbases, records her outcome, opens her door again and puts the qubit in the now collapsed state outside. Apart from the qubit she has no way of interacting with the environment. Some time later another experimenter, Bob, opens the door of his laboratory and obtains a qubit. Also he measures in the eigenbasis of one of the Pauli matrices and records the outcome. They repeat this procedure a large (ideally: an infinite) number of times. Then they meet and analyze their joint measurement outcomes. These define the probabilities $P(a,b| j,i)$ for the outcomes a ∈ {−1, 1} and b ∈ {−1, 1} of Alice's and Bob's measurements, given they measured in the eigenbasis of the jth and ith Pauli matrix, respectively. For the marginals we assume $P(a| j,i)={\sum }_{b}P(a,b| j,i)=1/2\forall a\in \{-1,1\}$ and accordingly for Bob. They are thus able to define a correlation matrix M with elements

Equation (18)

where $P(b=1| j,i,a=1)$ is the probability that Bob obtains outcome 1 when measuring the observable σi, conditioned on Alice's measurement of σj with outcome 1, and $\langle {\sigma }_{j}{\sigma }_{i}\rangle $ denotes the expectation value of the product of Alice's σj and Bob's σi measurement outcomes.

The correlation matrix defines a unique positive trace preserving unital map ${ \mathcal M }\,:{\rho }_{A}\mapsto {\rho }_{B}$. They are guaranteed one of the following three possibilities: either they measured the same qubit, which was propagated in terms of a unital quantum channel ${ \mathcal E }$ from Alice to Bob; or that they each measured one of the two qubits in a unital bipartite state ρAB acting as a common cause, and hence the correlations where caused by the corresponding steering map ${ \mathcal S };$ or that the map from ρA to ρB is a probabilistic mixture where with probability p the steering map ${ \mathcal S }$ was realized and with probability (1 − p) the quantum channel ${ \mathcal E }$ (see figure 3), that is

Equation (19)

with the 'causality parameter' p ∈ [0, 1]. The task of Alice and Bob is now to find the true value of p and possibly also the nature of ${ \mathcal S }$ and ${ \mathcal E }$. In general there does not exist a unique solution and in this case they want to find the values of p for which maps of the form (19) explain the observed correlations.

As we mentioned in the previous section, every positive one qubit map is decomposable, so a possible explanation always exists. The decomposition (19) can be given a causal interpretation, where ${ \mathcal E }$ is considered to be a cause–effect explanation of the correlations and ${ \mathcal S }$ a common cause.

In the following subsections we give bounds on the causality parameter p and then consider some extremal cases. In section 4.4 we generalize a part of the work of Ried et al [2] and see how additional assumptions on the nature of ${ \mathcal E }$ and ${ \mathcal S }$ can lead to a unique solution.

4.2. Possible causal explanations

(p-causality/p-decomposability).

Definition 4.1 A single qubit unital positive trace preserving map ${ \mathcal M }$ is called $p$-causal/$p$-decomposable with $p\in [0,1]$, if it can be written as

Equation (20)

with ${ \mathcal E }$ (${ \mathcal S }$) being a CPTP (CcPTP) unital qubit map. Equation (20) is called a p-decomposition of ${ \mathcal M }$.

In the following let M, E, S denote the correlation matrices of ${ \mathcal M },{ \mathcal E },{ \mathcal S }$, and ${{\boldsymbol{\eta }}}^{{ \mathcal M }},{{\boldsymbol{\eta }}}^{{ \mathcal E }},{{\boldsymbol{\eta }}}^{{ \mathcal S }}$ the SSV of M, E, S, respectively. We first investigate for a fixed p what the possible SSV of the correlation matrix of a map ${ \mathcal M }$ are, such that ${ \mathcal M }$ is p-causal. This leads to the following theorem:

(SSV of p-causal maps).

Theorem 4.1 Let ${ \mathcal M }$ be a positive unital trace preserving qubit map with associated SSV given by ${{\boldsymbol{\eta }}}^{{ \mathcal M }}$. Let $p\in [0,1]$ be fixed. Then the following statement holds:

Equation (21)

where

Equation (22)

(see figure 4) where the vertices ${{\boldsymbol{v}}}_{i}^{\mathrm{CP}}$ of CP maps are given in (8), and the vertices ${{\boldsymbol{v}}}_{j}^{\mathrm{CcP}}$ of CcP maps in (15).

Proof. '$\Leftarrow $': From (22) we see that

Now define ${q}_{i}\equiv {\sum }_{j}{p}_{{ij}}$ and ${r}_{j}\equiv {\sum }_{i}{p}_{{ij}}$. Clearly qi, rj ≥ 0 and ${\sum }_{i}{q}_{i}={\sum }_{j}{r}_{j}=1$. We can then write

with ${{\boldsymbol{\eta }}}^{{ \mathcal E }}\equiv {\sum }_{i}{q}_{i}{{\boldsymbol{v}}}_{i}^{\mathrm{CP}}\in {{ \mathcal T }}_{\mathrm{CP}}$ and ${{\boldsymbol{\eta }}}^{{ \mathcal S }}\equiv {\sum }_{j}{r}_{j}{{\boldsymbol{v}}}_{j}^{\mathrm{CcP}}\in {{ \mathcal T }}_{\mathrm{CcP}}$. We herewith explicitly constructed a p-decomposition of ${ \mathcal M }$ where the correlation matrices of ${ \mathcal E }$ and ${ \mathcal S }$ have their SSV decomposition involving the same rotations as the SSV decomposition of the correlation matrix of ${ \mathcal M }$.

'$\Rightarrow $': Let p be fixed. Suppose that ${ \mathcal E }$ and ${ \mathcal S }$ are both extremal maps, i.e. ${{\boldsymbol{\eta }}}^{{ \mathcal E }}$ and ${{\boldsymbol{\eta }}}^{{ \mathcal S }}$ are given by one of the vertices defined in (8) and (15), respectively, and without loss of generality we assume that these are ${{\boldsymbol{v}}}_{1}^{\mathrm{CP}}$ and ${{\boldsymbol{v}}}_{1}^{\mathrm{CcP}}$ (this is justified as taking another vertex leads to the same result). Define A = (1 − p)E and B = pS, where A has SSV (1 − p, 1 − p, 1 − p) and B has SSV (−p, −p, −p). In the appendix we prove theorem A.1 that restricts the possible SSV of A + B. For our case it gives

Now suppose ${ \mathcal E }$ and ${ \mathcal S }$ are not extremal maps. Since the SSV of those are simply convex combinations of the SSV of the extremal maps, it follows that also for such maps the SSV of  M lie within ${{ \mathcal C }}_{p}$ (see equation (59) of the appendix).■

We have seen that for a given value of p the allowed SSV associated with a positive map ${ \mathcal M }$ that is p-causal lie within ${{ \mathcal C }}_{p}$ given in (22). We now turn the task around and go back to the causal inference scenario. Given a positive map ${ \mathcal M }$ we want to tell if we can bound the causality parameter p. We will do this based on the following definition: (Causal interval I  ).

Definition 4.2 For a given positive unital qubit map ${ \mathcal M }$ we define the interval of possible causal explanations (for short: the causal interval) ${I}_{{ \mathcal M }}$, such that ${ \mathcal M }$ is $p$-causal if and only if $p\in {I}_{{ \mathcal M }}$.

Since every qubit map is decomposable [3, p. 258] the causal interval is always non empty, ${I}_{{ \mathcal M }}\ne \varnothing $.

Theorem 4.2. Let ${ \mathcal M }$ be a positive unital qubit map, with associated SSV ${{\boldsymbol{\eta }}}^{{ \mathcal M }}$ (we assume ${\eta }_{i}^{{ \mathcal M }}\geqslant 0$ for $i=1,2$). Then the causal interval of ${ \mathcal M }$ is given by

Equation (23)

Equation (24)

with ${{\boldsymbol{v}}}_{1}^{\mathrm{CP}}={(1,1,1)}^{T}$ (${{\boldsymbol{v}}}_{4}^{{\rm{CcP}}}={(1,1,-1)}^{T}$) defining a vertex of the CPTP (CcPTP) tetrahedron ${{ \mathcal T }}_{{\rm{CP}}}$ (${{ \mathcal T }}_{{\rm{CcP}}}$).

Note that the assumption ${\eta }_{i}^{{ \mathcal M }}\geqslant 0$ for i = 1, 2 can always be met, using the unitary freedom in the decomposition in the right way.

Proof. We show the theorem for pmax, the determination of pmin can be treated in an analog way.

First we check if ${ \mathcal M }$ is a CcPTP map, by checking if ${{\boldsymbol{\eta }}}^{{ \mathcal M }}\in {{ \mathcal T }}_{\mathrm{CcP}}$. If it is CcPTP then pmax = 1, trivially.

Now suppose it is not CcPTP. pmax is then given such that ${{\boldsymbol{\eta }}}^{{ \mathcal M }}\in {{ \mathcal C }}_{{p}_{\max }}$ but ${{\boldsymbol{\eta }}}^{{ \mathcal M }}\notin {{ \mathcal C }}_{p^{\prime} }$ with $p^{\prime} \in ({p}_{\max },1]$. This implies that ${{\boldsymbol{\eta }}}^{{ \mathcal M }}$ lies on the surface of ${{ \mathcal C }}_{{p}_{\max }}$. Since we assumed ${\eta }_{i}^{{ \mathcal M }}\geqslant 0$ for i = 1, 2, the critical facet of ${{ \mathcal C }}_{{p}_{\max }}$ is the one which is perpendicular to ${{\boldsymbol{v}}}_{1}^{\mathrm{CP}}$ and has the vertices ${(1,1,1-2{p}_{\max })}^{T},{(1,1-2{p}_{\max },1)}^{T},{(1-2{p}_{\max },1,1)}^{T}$ (see figure 5). Since this facet is perpendicular to ${{\boldsymbol{v}}}_{1}^{\mathrm{CP}}$, ${{\boldsymbol{\eta }}}^{{ \mathcal M }}$ lies on this facet if its projection onto ${{\boldsymbol{v}}}_{1}^{\mathrm{CP}}$ equals the vector pointing from the origin to the intersection of the facet and ${{\boldsymbol{v}}}_{1}^{\mathrm{CP}}$, given as ${\boldsymbol{u}}\equiv (1-(2/3){p}_{\max }){{\boldsymbol{v}}}_{1}^{\mathrm{CP}}$, see figure 5. Hence we get the following equation

Equation (25)

Equation (26)

Equation (27)

Figure 5.

Figure 5. Sketch for proof of theorem 4.2: the value of pmax is determined through the projection of ${{\boldsymbol{\eta }}}^{{ \mathcal M }}$ onto ${{\boldsymbol{v}}}_{1}^{\mathrm{CP}}$, which is given by ${\boldsymbol{u}}$. The red triangle is one of the facets of ${{ \mathcal C }}_{{p}_{\max }}$.

Standard image High-resolution image

4.3. Extremal cases

In the previous section we found the general form of the causal interval ${I}_{{ \mathcal M }}$ for an observed map ${ \mathcal M }$. We now analyze the extremal cases where the interval reduces to a single value or on the other hand the interval is given as ${I}_{{ \mathcal M }}=[0,1]$.

As already noted in [2, table 1]. there are extremal cases that allow for a complete solution of the problem even without any additional constraints. This is the case if ${{\boldsymbol{\eta }}}^{{ \mathcal M }}$ equals one of the vertices of the cube of positive maps, see figure 2. The solution is then either p = 0 (pure cause-effect), which was actually already noted in [23] as the value 1 of a 'causality measure', if the SSV are all positive or exactly two are negative or p = 1 (pure common cause) if the SSV are all negative or exactly one positive. The exact reconstruction of ${ \mathcal E }$ or ${ \mathcal S }$ in this cases is trivial.

Interestingly, with theorem 4.2 we can show that every point on the edges of the cube ${ \mathcal C }$ defined in (16) gives us a unique solution without additional constraints:

Proof. Let ${ \mathcal M }$ be a positive map and M be the corresponding correlation matrix with $M={R}_{1}{\eta }^{{ \mathcal M }}{R}_{2}$ where ${\eta }^{{ \mathcal M }}=\mathrm{diag}({{\boldsymbol{\eta }}}^{{ \mathcal M }})$ with the SSV ${{\boldsymbol{\eta }}}^{{ \mathcal M }}={(1,1,1-2p)}^{T},\,p\in [0,1]$, and two rotations R1, R2SO(3). Due to the freedom in R1 and R2 this describes all maps with corresponding vector of SSV on one of the edges of the cube ${ \mathcal C }$ defined in (16). According to theorem 4.2 we find

Equation (28)

Equation (29)

By theorem A.1 it follows, that the maps ${ \mathcal E }$ and ${ \mathcal S }$ in the decomposition (19) necessarily correspond to extremal points in ${{ \mathcal T }}_{\mathrm{CP}}$ and ${{ \mathcal T }}_{\mathrm{CcP}}$ defined in (7) and (14) (unitary channel and maximally entangled state). It is then obvious that

Equation (30)

is the only possible solution. ■

Note that for arbitrary p ∈ [0, 1] the SSV of the correlation matrix M lie on the edges of the cube ${ \mathcal C }$ if and only if E and S have a SSV decomposition with respect to the same rotations and the SSV lie on adjacent vertices of ${ \mathcal C }$. The proof for this is provided in the appendix A.2.

In the other extreme case, if the map ${ \mathcal M }$ is superpositive, i.e. CP and CcP (see figure 2), it could be explained by a pure CPTP, a pure CcPTP map, or any convex combination of those two. Therefore one cannot give any restrictions of possible values of p [2, III.E of supplementary information].

Proof. Let ${ \mathcal M }$ be a superpositive map. There exists a SSV decomposition of its correlation matrix for which ${{\boldsymbol{\eta }}}^{{ \mathcal M }}\in {{ \mathcal O }}_{\mathrm{SP}}$, defined in (17), and for which ${\eta }_{i}^{{ \mathcal M }}\geqslant 0$ for i = 1, 2. Hence we can write ${{\boldsymbol{\eta }}}^{{ \mathcal M }}={p}_{1}{\hat{e}}_{x}+{p}_{2}{\hat{e}}_{y}+{p}_{3}{\hat{e}}_{z}+{p}_{4}(-{\hat{e}}_{z})$, with ${\sum }_{i}{p}_{i}=1$. The scalar product of each component of ${{\boldsymbol{\eta }}}^{{ \mathcal M }}$ with ${{\boldsymbol{v}}}_{1}^{\mathrm{CP}}={(1,1,1)}^{T}$ is upper bounded by 1. Hence we have ${{\boldsymbol{\eta }}}^{{ \mathcal M }}\cdot {{\boldsymbol{v}}}_{1}^{\mathrm{CP}}\leqslant 1$ and with that equation (23) evaluates to pmax = 1. Analogously one finds pmin = 0. ■

4.4. Additional assumptions / Causal inference with constrained classical correlations

So far we only assumed that our data is generated by a unital channel and a unital state (a state whose local partitions are completely mixed). We have seen that in some extreme cases a unique solution to the problem can be found. Ried et al showed that one can always find a unique solution for p if one restricts the channel to unitary channels and the bipartite states to maximally entangled pure states [2]. Furthermore, it is then possible to reconstruct the channel and the state up to binary ambiguity, meaning there are two explanations leading to the same observed correlations. The ellipsoids associated with unitary channels and maximally entangled states are spheres with unit radius and the SSV of their correlation matrices correspond to the vertices of ${{ \mathcal T }}_{\mathrm{CP}}$ and ${{ \mathcal T }}_{\mathrm{CcP}}$ respectively.

In the following we investigate this scenario again, but add a known amount of noise in the channel or in the bipartite state. For the channel this is done by mixing the unitary evolution with a completely depolarizing channel [4]. The completely depolarizing channel maps every Bloch vector to the origin, $\rho \mapsto \tfrac{{\mathbb{1}}}{2}$ and hence is represented by the zero matrix. The ellipsoid associated with the mixture of a completely depolarizing channel with a unitary channel thus results in a shrinked sphere. For strong enough noise the result eventually becomes an entanglement breaking channel, which only produces 'classical' correlations [22]. Due to the unitary freedom compared to standard depolarizing channels, we call these channels generalized depolarizing channel. For the state we mix a pure maximally entangled state with the completely mixed state, whose correlation matrix is given by the zero matrix. We call the state a generalized Werner state, in the sense that instead of a convex combination of a singlet and a completely mixed state [24] we allow the convex combination of an arbitrary maximally entangled state with the completely mixed state. States at a certain threshold of noise become separable and the correlations become 'classical' [20]. We will then see that even when confronted with purely classical correlations, if we have enough a priori-knowledge about the data generation, i.e. we know the amount of noise, we can still find a solution analogous to [2], in the sense of determining uniquely the parameter p, and the channel and the state up to binary ambiguity1 . We will first keep the unitary channel and start with a generalized Werner state and show how one can recreate the scenario of Ried et al. Then we will add the noise in the channel.

4.4.1. Solution of the causal inference problem using generalized Werner states

The analysis follows closely in spirit section 3.4 in the supplementary information in [2]. We start again with equation (19) and assume that the steering map ${ \mathcal S }$ is generated by a shared generalized Werner state ${\rho }_{{AB}}=\epsilon \tfrac{{\mathbb{1}}}{4}+(1-\epsilon )| \psi \rangle \langle \psi | $, where the parameter epsilon ∈ (0, 1) is known and fixed in advance and $| \psi \rangle $ is an unknown maximally entangled pure state. The map ${ \mathcal E }$ is generated by an unknown unitary channel U.

Since epsilon is fixed, the class of allowed explanations is completely defined up to unitary freedom in the channel and in the state. Hence the number of free parameters is the same as in the case considered in [2], which coincides with the case epsilon = 0. For epsilon ≥ 2/3 the state ρAB becomes separable, i.e.is not entangled anymore, see [24] and figure 5 in the supplementary information of [20]. But the reconstruction works independently of epsilon. Hence, we see here that the possibility of reconstruction hinges not on the entanglement in ρAB but on the prior knowledge we have about ρAB.

The correlation matrix corresponding to the generalized Werner state is simply the one of a maximally entangled state shrinked by a factor 1 − epsilon and will thus be denoted (1 − epsilon) S, where S is the correlation matrix corresponding to a maximally entangled state. Thus in our scenario the information Alice and Bob obtain characterizes the matrix

Equation (31)

The ellipsoid is described by the eigenvalues and -vectors of MMT. The eigenvectors correspond to the direction of the semi axes and the squareroots of the eigenvalues are their lengths. There is one degenerate pair and another single one. The eigenvector corresponding to the non-degenerate semi axis is parallel to ${\boldsymbol{n}}$ which is defined as the axis on which the images of a point on the Bloch sphere under S and E are diametrically opposed. Hence the length of this semi axis is ${l}_{1}=| 1-p-p(1-\epsilon )| $. Furthermore we have

if l1 > 0 and det M = 0 if l1 = 0. Thus if we calculate the length of this semi axis we can already determine the causality parameter p as

Equation (32)

where the ambiguity is solved by considering the sign of $\det M$.

Now that we have p and epsilon at hand we can define a new map with correlation matrix

Equation (33)

where we defined

Equation (34)

Equation (35)

The properties of the ellipsoid can also be found in the SSV decomposition of the correlation matrix

Equation (36)

The absolute values of the entries of ${{\boldsymbol{\eta }}}^{{ \mathcal M }}$ equal the lengths of the semi axes of the ellipsoid and we choose R1 and R2 such that ${\eta }_{1}^{{ \mathcal M }}={\eta }_{2}^{{ \mathcal M }}$. The axis on which the images of a point on the Bloch sphere under S and E are diametrically opposed is then given by the last column of R1, i.e. $\hat{n}={R}_{1}{\hat{e}}_{3}$. The length of this axis is ${l}_{1}=| {\eta }_{3}^{{ \mathcal M }}| $.

In (33) the promise is given that S is the correlation matrix of a maximally entangled state and that E is the correlation matrix of a unitary channel. The reconstruction of those is extensively studied in the supplementary information of [2]. With the method presented there we find the value of $p^{\prime} $ and can restore the correlation matrices corresponding to U and $| \psi \rangle $ up to a binary ambiguity, and hence solve the causal inference problem. We review this in terms of SSV and discuss where the binary ambiguity arises.

Starting from the lhs of (33) the goal is to determine $p^{\prime} ,S,$ and M on the rhs. Consider the SSV decomposition of the correlation matrix

Equation (37)

The absolute values of the entries of ${{\boldsymbol{\eta }}}^{{ \mathcal M }^{\prime} }$ equal the lengths of the semi axes of the ellipsoid and we choose ${R}_{1}^{{\prime} },{R}_{2}^{{\prime} }\,{\rm{s}}$.t. ${\eta }_{1}^{{ \mathcal M }^{\prime} }={\eta }_{2}^{{ \mathcal M }^{\prime} }$. The axis on which the images of a point on the Bloch sphere under S and E are diametrically opposed is then given by the last column of ${R}_{1}^{{\prime} }$, i.e. $\hat{n}^{\prime} ={R}_{1}^{{\prime} }{\hat{e}}_{3}$. The length of this axis is ${l}_{1}^{{\prime} }=| {\eta }_{3}^{{ \mathcal M }^{\prime} }| $. However, the direction of $\hat{n}^{\prime} $, depending on the choice of ${R}_{1}^{{\prime} }$ and ${R}_{2}^{{\prime} }$, cannot be determined uniquely and allows two possible solutions $\pm \hat{n}^{\prime} $. The parameter $p^{\prime} $ is determined by the length ${l}_{1}^{{\prime} }$ and can be calculated as

Equation (38)

and if $\det (M^{\prime} )=0$ we have $p^{\prime} =1/2$. If $p^{\prime} =0$ or $p^{\prime} =1$ the reconstruction is trivial (of course in these cases one cannot reconstruct S or E, respectively). If $p^{\prime} \in (0,1)$, we can define [2]

Equation (39)

Equation (40)

Equation (41)

The reconstruction of the correlation matrices S and E can then be done, see equation (58) and (59) in the supplementary information of [2]:

Equation (42)

Equation (43)

where ${R}_{\hat{n},\alpha }$ indicates a rotation about axis $\hat{n}$ with rotation angle $\alpha ,{S}_{\hat{n}^{\prime} ,1/(1-2p^{\prime} )}$ a scaling along $\hat{n}^{\prime} $ by a factor $1/(1-2p^{\prime} )$ and ${S}_{\perp \hat{n}^{\prime} ,1/r^{\prime} }$ a scaling of the plane perpendicular to $\hat{n}^{\prime} $ by a factor $1/r^{\prime} $. From (42) and (43) we see that a reconstruction of E and S is not possible if $p^{\prime} =1/2$.

Let us summarize what we can infer about the causation of M given in (31).

  • The causality parameter p can be determined uniquely in all cases, see equation (32).
  • If $r^{\prime} =0$ or $p^{\prime} =1/2$ then S and E cannot be determined,
  • else we can determine two sets of solutions for E and S given by (42) and (43), distinguished by the choice of direction of $\hat{n}^{\prime} $.

On the other hand, if we do not have prior knowledge of epsilon, then in general we cannot determine p with (32). This ambiguity can easily be illustrated by looking at an example.

Take $U={\sigma }_{x}$ and $| \psi \rangle =\tfrac{| 00\rangle -| 11\rangle }{\sqrt{2}}$. We then have:

Combining this for arbitrary epsilon and p gives

Hence for all values of the parameters where pepsilon = cons., the measurement statistics for Alice and Bob are exactly the same and there is no way to distinguish different pairs of values.

Analogously to using a generalized Werner state for the steering map, we can also use a generalized depolarizing channel. Then, with prior knowledge of the amount of noise, we can still find a complete solution even though the resulting channel might be entanglement breaking.

4.4.2. Generalized depolarizing channel and generalized Werner state

We shall now consider the case where both the channel as well as the state are mixed with a known amount of noise. Therefore we take $S^{\prime} =(1-{\epsilon }_{c})S$ for a generalized Werner state (thus S corresponds again to a rotated and inverted Bloch sphere) and $E^{\prime} =(1-{\epsilon }_{e})E$ for a generalized depolarizing channel. We again assume ${\epsilon }_{e}\,\in (0,1)$ and ${\epsilon }_{c}\,\in (0,1)$ to be known. We then have

Equation (44)

The reconstruction works as follows. Without loss of generality we assume epsilone ≤ epsilonc (in the other case we just have to make the reconstruction discussed in the previous subsection for the entanglement breaking channel and not for the Werner state). The only thing we have to do is to divide by (1 − epsilone) to restore the problem of the previous section

with $1-\epsilon \equiv \tfrac{1-{\epsilon }_{c}}{1-{\epsilon }_{e}}$. The rest can then be solved as in the previous subsection.

Again we remark that nothing changes if we have epsilonc ≥ 2/3 and epsilone ≥ 2/3 even though at that transition the states become separable and the channels entanglement breaking, respectively.

5. Discussion

In this work we extended the results initially found by Ried et al [2]. We introduced an active and a passive quantum observational scheme as analogies to the classical observational scheme. The passive quantum observational scheme does not allow for an advantage over classical casual inference. In the active quantum observational scheme Alice and Bob can freely choose their measurement bases, which in principle allows for signaling. However, we investigated the quantum advantage over classical causal inference in a scenario where signaling is not possible in the active quantum observation scheme, as Alice' incoming state is completely mixed.

We showed how the geometry of the set of SSV of correlation matrices representing positive maps of the density operator ${\rho }_{A}\mapsto {\rho }_{B}$ determines the possibility to reconstruct the causal structure linking ρA and ρB. We showed that there are more cases than previously known for which a complete solution of the causal inference problem can be found without additional constraints, namely all correlations created by maps whose SSV of the correlation matrix lie on the edges of the cube of positive maps ${ \mathcal C }$ defined in (16). A necessary and sufficient condition for the SSV of the correlation matrix to lie on one of the edges is that it is any mixture of a maximally entangled state with an unitary channel and that their corresponding correlation matrices have a SSV decomposition involving the same rotations with the resulting SSV on two adjacent vertices of ${ \mathcal C }$.

For correlations guaranteed to be produced by a mixture of a unital channel and a unital bipartite state, we quantified the quantum advantage by giving the intervals for possible values of the causality parameter p. Here, in order to constrain p, and hence have an advantage over classical causal inference, it is necessary that the correlations were caused by an entangled state and/or an entanglement preserving channel. This is because correlations caused by any mixture of a separable state and an entanglement breaking quantum channel always describe SP maps. According to theorem 4.2 the causal interval for any SP map ${ \mathcal M }$ is ${I}_{{ \mathcal M }}=[0,1]$. Hence, SP maps do not allow any causal inference.

Things change when we further strengthen the assumptions on the data generating processes and allow only unitary freedom in the state, corresponding to a generalized Werner state with given degree of noise epsilonc, or unitary freedom in the channel, corresponding to a generalized depolarizing channel with given degree of noise epsilone. We showed that in this scenario the causality parameter p can always be uniquely determined and in most cases the state and the channel can be reconstructed up to binary ambiguity. For epsilonc ≥ 2/3 the state becomes separable and for epsilone ≥ 2/3 the channel entanglement breaking but still causal inference is feasible. Therefore entanglement and entanglement preservation are not a necessary condition in this scenario. The assumptions on the data generating processes, i.e. a priori knowledge of epsilonc and epsilone, are strong enough, such that even correlations corresponding to SP maps reveal the underlying causal structure.

Acknowledgments

The authors acknowledge support by Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of University of Tübingen.

Appendix

A.1. SSV of sums of matrices

Let A be a n × n real matrix. A possible singular value decomposition (SVD) of A is given as

Equation (45)

where O1,2 are orthogonal matrices (${O}_{i}{O}_{i}^{T}={\mathbb{1}}$) and D is a positive semi-definite diagonal matrix $D=\mathrm{diag}({\sigma }_{1}^{A},\,\ldots ,\,{\sigma }_{n}^{A})$, with ${\sigma }_{i}^{A}$ called the (absolute) singular values (SV) of A. The matrices in (45) are not uniquely defined and all possible permutations of the SV on the diagonal of D are possible for different orthogonal matrices O1 and O2. We use this freedom to write the SV in canonical order, ${\sigma }_{1}^{A}\geqslant {\sigma }_{2}^{A}\geqslant ...\geqslant {\sigma }_{n}^{A}$.

Example. We give two different SVDs of a 3 × 3 matrix B

The last decomposition gives the SV of B in canonical order ${\sigma }_{1}^{B}=3,{\sigma }_{2}^{B}=2$ and ${\sigma }_{3}^{B}=1$.

Next we call

Equation (46)

the SSV decomposition (also called real SV [25]) of A, where ${R}_{i}\in {SO}(n)$ are orthogonal matrices with determinant equal to one. In the 3 × 3 scenario these correspond to proper rotations in ${{\mathbb{R}}}^{3}$. The diagonal matrix $D^{\prime} $ contains the SSV of A. The SSV have the same absolute values as the SV but additionally can have negative signs. Concretely, the freedom in choosing R1 and R2 allows one to get any permutations of the SV on the diagonal of $D^{\prime} $ together with an even or odd number of minus signs, depending on whether A has positive or negative determinant, respectively. If at least one singular value equals 0, the number of signs becomes completely arbitrary. Using the same matrix B as above we give two different SSV decompositions as an example:

Equation (47)

Equation (48)

For the SSV decomposition we define a canonical order with the absolute values of the SV sorted in decreasing order and only a negative sign on the last entry if the matrix has negative determinant, as in (48). The rotational freedom in (46) allows for arbitrary permutations of the order of SV and addition of any even number of minus signs.

Confusion may arise since for example an ${{\mathbb{R}}}^{3}$ permutation matrix corresponding to a permutation of exactly two coordinates has determinant −1, so why would it be allowed? The point is, that we not only want to permute elements of a vector, but the diagonal elements of a matrix. We illustrate that by permuting two components of (i) a vector and (ii) a diagonal matrix.

Equation (49)

Equation (50)

Equation (51)

I.e. as $-{P}_{{yz}}={R}_{\hat{{\boldsymbol{x}}}}(\pi /2)\cdot {R}_{\hat{{\boldsymbol{y}}}}(\pi )$ the effect of permuting the second and third diagonal entry of a diagonal matrix can also be obtained by proper rotations, and correspondingly for other permutations of the SSV. Hence all permutations of the SSV are allowed.

Fan [26] gave bounds on the SV of A + B given the SV of two real matrices A and B, derived from the corresponding results for eigenvalues of Hermitian matrices and using that the matrix $\tilde{A}\equiv \left(\begin{array}{cc}{0}_{n\times n} & A\\ {A}^{T} & {0}_{n\times n}\end{array}\right)$ has the SV of A and their negatives as eigenvalues [27, p. 243 for review]. In the main part of this work we need a more constraining statement using the SSV, and thus taking the determinant of A, B, and A + B into account as well. This leads to theorem A.1. In the following we will denote with $\tilde{{\boldsymbol{\sigma }}}(A)$ the vector of canonical SSV of the n × n real matrix A. Since the product of two rotations is again a rotation it follows directly from (46) that

Equation (52)

Let ${\boldsymbol{w}}$ be a n-dimensional vector. We define

Equation (53)

as the convex hull of all possible permutations π Sn of the components of ${\boldsymbol{w}}$ multiplied with an even number of minus signs. The following observation looks intuitive, but we explicitly mention it here due to its importance for the main result:

Equation (54)

Proof. We denote with ${\hat{{\boldsymbol{w}}}}_{1},\,...,\,{\hat{{\boldsymbol{w}}}}_{r}$ the vertices of ${{\rm{\Delta }}}_{{\boldsymbol{w}}}$, with r the number of vertices. Since ${\boldsymbol{v}}\in {{\rm{\Delta }}}_{{\boldsymbol{w}}}$ there exist weights p1, ..., pr ≥ 0, with ${\sum }_{i=1}^{r}{p}_{i}=1$, such that ${\boldsymbol{v}}={\sum }_{i=1}^{r}{p}_{i}{\hat{{\boldsymbol{w}}}}_{i}$. Now consider ${\boldsymbol{v}}^{\prime} $ as a permutation of the elements of ${\boldsymbol{v}}$ together with an even number of sign flips (as in the definition of the vertices in equation (53)). By this operation any vertex ${\hat{{\boldsymbol{w}}}}_{i}$ of ${{\rm{\Delta }}}_{{\boldsymbol{w}}}$ gets again mapped to a vertex ${\hat{{\boldsymbol{w}}}}_{i}^{{\prime} }$ of ${{\rm{\Delta }}}_{{\boldsymbol{w}}}$. Thus ${\boldsymbol{v}}^{\prime} ={\sum }_{i=1}^{r}{p}_{i}{\hat{{\boldsymbol{w}}}}_{i}^{{\prime} }\in {{\rm{\Delta }}}_{{\boldsymbol{w}}}$. Hence all vertices of ${{\rm{\Delta }}}_{{\boldsymbol{v}}}$ are elements of ${{\rm{\Delta }}}_{{\boldsymbol{w}}}$ and with that ${{\rm{\Delta }}}_{{\boldsymbol{v}}^{\prime} }\subseteq {{\rm{\Delta }}}_{{\boldsymbol{v}}}$, as the convex hull of elements of a convex set is a subset of the original convex set. ■

Let now ${{\boldsymbol{w}}}_{{\bf{1}}}$ and ${{\boldsymbol{w}}}_{{\bf{2}}}$ be two n-dimensional vectors. We define

Equation (55)

Figure A1 presents an illustration of the case n = 2.

Figure A1.

Figure A1. Illustration of theorem A.1: suppose we have two 2 × 2 matrices A and B with SSV ${{\boldsymbol{w}}}_{1}$ and ${{\boldsymbol{w}}}_{2}$ respectively. The red and the yellow sets correspond to ${{\rm{\Sigma }}}_{{{\boldsymbol{w}}}_{1}}$ and ${{\rm{\Sigma }}}_{{{\boldsymbol{w}}}_{2}}$ defined by (53). By theorem A.1 the vector of SSV of A + B then lies within the blue set, defined by (55).

Standard image High-resolution image

Theorem A.1. Let $A$ and $B$ be two $n\times n$ real matrices whose SSV are known. Then

Equation (56)

Proof. Let A be a n × n real matrix and let ${\boldsymbol{d}}(A)$ denote the vector of diagonal entries of A. Thompson showed the following two statements about the diagonal elements of A [28, theorems 7 and 8]

Equation (57)

Equation (58)

Now let A and B be two n × n real matrices. Let R1, R2SO(n) such that ${\boldsymbol{d}}({R}_{1}(A+B){R}_{2})=\tilde{{\boldsymbol{\sigma }}}(A+B)$. We then have

where the second equation follows from the linearity of matrix addition in every element and the last equality from (52). ■

Consider an additional n × n real matrix $A^{\prime} $. Then

Equation (59)

Proof. By theorem A.1 we have $\tilde{{\boldsymbol{\sigma }}}(A^{\prime} +B)\in {{\rm{\Sigma }}}_{\tilde{{\boldsymbol{\sigma }}}(A^{\prime} ),\tilde{{\boldsymbol{\sigma }}}(B)}$. Equation (59) follows then by equation (54) and the definition of ${{\rm{\Sigma }}}_{\tilde{{\boldsymbol{\sigma }}}(A^{\prime} ),\tilde{{\boldsymbol{\sigma }}}(B)}$ in (55). ■

As mentioned above, results for the absolute SV of A + B have been known before. To complete, we show that the above proof works analogously for the corresponding statement on absolute SV: let ${\boldsymbol{\sigma }}(A)$ denote the vector of canonical absolute SV of an n × n real matrix $A,{\sigma }_{1}(A)\geqslant {\sigma }_{2}(A)\geqslant ...\geqslant {\sigma }_{n}(A)$. Let B be another n × n real matrix. Then [27, chapter 9 G.1.d].

Equation (60)

i.e. the vector of canonical SV of A + B is weakly majorized by the sum of the vectors of canonical SV of A and B. Weak majorization for two vectors ${\boldsymbol{x}}$ and ${\boldsymbol{y}}$ with x1 ≥ x2 ≥ ... ≥ xn and y1 ≥ y2 ≥ ... ≥ yn is defined as

Equation (61)

To see (60) define ${{\rm{\Delta }}}_{{\boldsymbol{w}}}^{{\prime} }$ analogously to (53) but without the constraint ${\prod }_{\nu }{s}_{\nu }=1$, i.e. allowing arbitrary sign flips. The analog statements of (57) and (58) hold if we exchange the SSV with the absolute SV, proper rotations (elements of SO(n)) with orthogonal matrices (elements of O(n)), and ${{\rm{\Delta }}}_{{\boldsymbol{w}}}$ with ${{\rm{\Delta }}}_{{\boldsymbol{w}}}^{{\prime} }$. We then find, that ${\boldsymbol{\sigma }}(A+B)\in {{\rm{\Sigma }}}_{{\boldsymbol{\sigma }}(A),{\boldsymbol{\sigma }}(B)}^{{\prime} }$, with ${{\rm{\Sigma }}}_{{{\boldsymbol{w}}}_{{\bf{1}}},{{\boldsymbol{w}}}_{{\bf{2}}}}^{{\prime} }\equiv \{a+b| a\in {{\rm{\Delta }}}_{{{\boldsymbol{w}}}_{{\bf{1}}}}^{{\prime} },b\in {{\rm{\Delta }}}_{{{\boldsymbol{w}}}_{{\bf{2}}}}^{{\prime} }\}$. Since per definition the absolute SV are non-negative, we can further restrict ${\rm{\Sigma }}^{\prime} $ to the first hyperoctant. On the other hand, for two vectors ${\boldsymbol{x}},{\boldsymbol{y}}\in {{\mathbb{R}}}_{+}^{n}$ we have (proposition C.2. of chapter 4 in [27])

Equation (62)

The set on the rhs coincides with the restriction of ${\rm{\Sigma }}^{\prime} $ to the first hyperoctant if we take ${\boldsymbol{y}}={\boldsymbol{\sigma }}(A)+{\boldsymbol{\sigma }}(B)$. Taking ${\boldsymbol{x}}={\boldsymbol{\sigma }}(A+B)$, equation (60) follows.

A.2. Additional proofs

In section 4.3 we state, that for arbitrary p ∈ [0, 1] the SSV of the correlation matrix M lie on the edges of the cube ${ \mathcal C }$ if and only if E and S have a SSV decomposition with respect to the same rotations and the SSV lie on adjacent vertices of ${ \mathcal C }$.

Proof. The sufficiency of this condition is trivial. The necessity of extremal SSV follows from theorem A.1. Now let us see why it is further necessary that the correlation matrices have a common SSV decomposition, hence for the following assume that the SSV of E and S are given by one of the vertices of ${{ \mathcal T }}_{\mathrm{CP}}$ and ${{ \mathcal T }}_{\mathrm{CcP}}$, respectively. Without restriction of generality consider the SSV decomposition of M with respect to the rotation matrices R1, R2 such that

Equation (63)

With equation (57) the diagonal entries of ${R}_{1}{{ER}}_{2}$ and ${R}_{1}{{SR}}_{2}$ are constrained to ${{ \mathcal T }}_{\mathrm{CP}}$ and ${{ \mathcal T }}_{\mathrm{CcP}}$, respectively. In order to fulfill the second equality in equation (63) it is hence necessary that the first two diagonal entries of ${R}_{1}{{ER}}_{2}$ and R1S R2 are equal to one, respectively. The only elements in ${{ \mathcal T }}_{\mathrm{CP}}$ and ${{ \mathcal T }}_{\mathrm{CcP}}$ allowing for this are (1, 1, 1) and (1, 1, −1), respectively. Hence the diagonal entries of R1E R2 and of R1S R2 equal their SSV. To see that this implies that R1E R2 and R1S R2 are diagonal matrices, and hence E and S have a common SSV decomposition, consider the Frobenius norm, which for a n × n real matrix A given as

Equation (64)

where we used that AT A has the squared SV (and hence the squared SSV) of A on its diagonal [25]. Now if A has its SSV on the diagonal, i.e. ${A}_{{ii}}={\tilde{\sigma }}_{i}(A)$ for i = 1, ..., n , equation (64) directly implies Aij = 0 for $i\ne j$, and hence A is diagonal. Now that E and S have a common SSV decomposition the SSV of M are just the convex combination of the SSV of E and S. If those lie on the same edge of ${ \mathcal C }$, i.e. they are adjacent, then also those of M lie on the connecting edge. However if the SSV are not adjacent then their convex combination results in a point strictly inside ${ \mathcal C }$. ■

Footnotes

  • Strictly speaking, only for $p\ne 1/2$ one can always determine the unitary and the state. For p = 1/2 there is an infinite number of channels and states (all those where every point is diametrically opposed for the unitary channel and the state.), for which the ellipsoid reduces to a single point, and hence the correlation matrix is the zero matrix. The parameter p = 1/2 can then be restored but not the unitary and the state.

Please wait… references are loading.
10.1088/1367-2630/aad612