Permutationally invariant state reconstruction

Tobias Moroder; Philipp Hyllus; Géza Tóth; Christian Schwemmer; Alexander Niggebaum; Stefanie Gaile; Otfried Gühne; Harald Weinfurter

doi:10.1088/1367-2630/14/10/105001

1. Introduction

Full information about the experimental state of a quantum system is naturally highly desirable because it enables one to determine the mean value of each observable and thus also of every other property of the quantum state. Abstractly, such a complete description is given, for example, by the density operator, a positive semidefinite matrix ρ with unit trace. Quantum state tomography [1] refers to the task to determine the density operator for a previously unknown quantum state by means of appropriate measurements. Via the respective outcomes, more and more information about the true state generating the data is collected up to the point where this information uniquely specifies the particular state. Quantum state tomography has successfully been applied in many experiments using different physical systems, including trapped ions [2] or photons [3], as prominent instances.

Unfortunately, tomography comes with a very high price due to the exponential scaling of the number of parameters required to describe composed quantum systems. For an N-qubit system the total number of parameters of the associated density operator is 4^N − 1 and any standard tomography protocol is naturally designed to determine all these variables. The most common scheme used in experiments [4] consists of locally measuring in the basis of all different Pauli operators and requires an overall amount of 3^N different measurement settings with 2^N distinct outcomes each. Other schemes, e.g. using an informationally complete measurement [5] locally, would require just one setting but, nevertheless, the statistics for 4^N give different outcomes. Hence the important figure of merit to compare different methods is given by the combination of settings and outcomes.

For such a scaling, the methods rapidly become intractable, already for present state-of-the-art experiments: recording, for example, the data of 14 trapped ions [6], currently the record for quantum registers, would require about 150 days, although 100 measurement outcomes can be collected for a single setting in only about 3 s. In photonic experiments this scaling is even worse because count rates are typically much lower; for example, in recent eight-photon experiments [7, 8], a coincidence of single click events occurs only on the order of minutes; hence, it would require about 7 years to collect an adequate data set. This directly shows that more sophisticated tomography techniques are mandatory.

New tomography protocols equipped with better scaling behaviour exploit the idea that the measurement scheme is explicitly optimized only for particular kinds of states rather than for all possible ones. If the true unknown state lies within this designed target class, then full information about the state can be obtained with much less effort, and if the underlying density operator is not a member, then a certificate signals that tomography is impossible in this given case. Recent results along this direction include tomography schemes designed for states with a low rank [9–11], particularly for important low rank states such as matrix product [12, 13] or multi-scale entanglement renormalization ansatz states [14]. Other schemes include a tomography scheme based on information criteria [15, 16] or—the topic of this paper—states with permutation invariance [17].

However, it needs to be stressed that in any real experiment all these tomography schemes must cope up with another, purely statistical challenge: since only a finite number of measurements can be carried out in any experiment, one cannot access the true probabilities predicted by quantum mechanics p_k = tr(ρ_trueM_k), operator M_k describing the measurements, but merely relative frequencies f_k. Although these deviations might be small, the approximation f_k ≈ p_k causes severe problems in the actual state reconstruction process, i.e. the task to determine the density operator from the observed data. If one naïvely uses the frequencies according to Born's rule $f_k={\mathrm {tr}}(\hat \rho _{\mathrm { lin}} M_k)$ and solves for the unknown operator $\hat \rho _{\mathrm { lin}}$ , then, apart from possible inconsistencies in the set of linear equations, the reconstructed operator $\hat \rho _{\mathrm { lin}} \! \not \geqslant 0$ is often not a valid density operator anymore because some of its eigenvalues are negative. Hence, in such cases, this reconstruction called linear inversion delivers an unreasonable answer. It should be kept in mind that the inconsistencies can also be due to systematic errors, e.g. if the true measurements are aligned wrongly relative to the respective operator representation, but such effects are typically ignored [18].

Therefore, statistical state reconstruction relies on principles other than linear inversion. In general, these methods require the solution of a nonlinear optimization problem, which is much harder to solve than just a system of linear equations. For large system sizes this becomes, besides the exponential scaling of the number of settings and outcomes, a second major problem, again due to the exponential scaling of the number of parameters of the density operator. In fact for the current tomography record of eight ions in a trap [2], this actual reconstruction took even longer than the experiment itself (one week versus a couple of hours). Hence, feasible quantum state tomography schemes for large systems must, in addition to an efficient measurement procedure, also possess an efficient state reconstruction algorithm; otherwise they are not scalable.

In this paper, we develop a scalable reconstruction algorithm for the proposed permutationally invariant tomography scheme [17]. It works for common reconstruction principles, including, among others, maximum likelihood and least squares methods. This scheme becomes possible once more by taking advantage of the symmetry of permutationally invariant states, which provides an efficient and operational way to store, characterize and even process those states. This method enables a large dimension reduction in the underlying optimization problem such that it gets into the feasible regime. The final low dimensional optimization is performed via nonlinear convex optimization which offers a great advantage in contrast to commonly used numerical routines, in particular regarding numerical stability and accuracy. Already a first prototype implementation of this algorithm allows state reconstruction for 20 qubits in a few minutes on a standard desktop computer.

The outline of the paper is as follows. Section 2 summarizes the background on permutationally invariant tomography and on statistical state reconstruction. The key method is explained in section 3 and highlighted via examples in section 4, which are generated by our current implementation. Section 5 collects all the technical details: the mentioned toolbox, more notes about convex optimization, additional information about the pretest or certificate and the measurement optimization, both addressed for large qubit numbers. Finally, in section 6 we conclude and provide an outlook on directions for further research.

2. Background

2.1. Permutationally invariant tomography

Permutationally invariant tomography has been introduced as a scalable reconstruction protocol for multi-qubit systems in [17]. It is designed for all states of the system that remain invariant under all possible interchanges of its different particles. Abstractly, such a permutationally invariant state ρ_PI of N qubits can be expressed in the form

$\begin{equation} \rho_{\rm PI} = [\rho]_{\rm PI} = \frac{1}{N!} \sum_{p\in S_N} V(p)\: \rho \: V(p)^{\dagger}, \end{equation} \tag{ 1 }$

where V (p) is the unitary operator which permutes the N different subsystems according to the particular permutation p and the summation runs over all possible elements of the permutation group S_N. Many important states, such as the Greenberger–Horne–Zeilinger states or Dicke states, fall within this special class.

As shown in [17], full information on such states can be obtained by using in total $({{N+2}\atop{N}}) =(N^2+3N+2)/2$ different local binary measurement settings, while for each setting only the count rates of (N + 1) different outcomes need to be registered. This finally leads to a cubic scaling in contrast to the exponential scaling of standard tomography schemes.

The measurement strategy that attains this number runs as follows: each setting is described by a unit vector $\hat a \in \mathbbm {R}^3$ which defines associated eigenstates |i〉_a of the trace-less operator $\hat a \cdot \vec \sigma$ . Each part locally measures in this basis and registers the outcomes '0' or '1', respectively. The permutationally invariant part can be reconstructed from the collective outcomes, i.e. only the number of '0' or '1' results at the different parties matters but not the individual site information. The corresponding coarse-grained measurements are given by

$\begin{equation} M_k^a= \sum_{p^\prime} V(p^\prime) \vert{0}\rangle_a\!\langle{0}\vert^{\otimes k} \otimes \vert{1}\rangle_a\!\langle{1}\vert^{\otimes N-k} V(p^\prime)^{\dagger}, \end{equation} \tag{ 2 }$

$\begin{equation} \hphantom{M_k^a}={N \choose k} \left[ \vert{0}\rangle_a\langle{0}\vert^{\otimes k} \otimes \vert{1}\rangle_a\!\langle{1}\vert^{\otimes N-k}\right]_{\rm PI} \end{equation} \tag{ 3 }$

with $\: k=0,\dots ,N,$ and where the summation p' is over all permutations that give distinct terms. In total, one needs the above stated number of different settings $\hat a$ . These settings can be optimized in order to minimize the total variance which provides an advantage in contrast to random selection.

In addition to this measurement strategy there is also a pretest which estimates the 'closeness' of the true, unknown state with respect to all permutationally invariant states from just a few measurement results [17]. This provides a way to test in advance whether permutationally invariant tomography is a good method for the unknown state.

Restricting oneself to the permutationally invariant part of a density operator has already been discussed in the literature: for example, for spins in a Stern–Gerlach experiment [19] or in terms of the polarization density operator [20–22]. Here, due to the restricted class of possible measurements, only the permutationally invariant part of, in principle distinguishable, particles is accessible [21, 22]. This is a strong conceptual difference compared to permutationally invariant tomography where one intentionally constrains oneself to this invariant part. Nevertheless, it should be emphasized that the employed techniques are similar.

2.2. Statistical state reconstruction

Since standard linear inversion of the observed data typically results in unreasonable estimates as explained in the introduction, one employs other principles for actual state reconstruction. In general, one uses a certain fit function F(ρ) that penalizes deviations between the observed frequencies f_k and the true probabilities predicted by quantum mechanics p_k(ρ) = tr(ρM_k) if the state of the system is ρ. The reconstructed density operator $\hat \rho$ is then given by the (often unique) state that minimizes this fit function,

$\begin{equation} \hat \rho = \arg \: \min_{\rho \geqslant 0} F(\rho); \end{equation} \tag{ 4 }$

hence the reconstructed state is precisely the one which best fits the observed data. Since the optimization explicitly restricts itself to physical density operators this ensures validity of the final estimate $\hat \rho$ in contrast to linear inversion. Depending on the functional form of the fit function, different reconstruction principles are distinguished. A list of the most common choices is provided in table 1.

Table 1. Common reconstruction principles and their corresponding fit functions F(ρ) used in the optimization given by equation (4); see text for further details.

Reconstruction principle	Fit function F(ρ)
Maximum likelihood [23]	$-\sum _k f_k \log [ p_k(\rho )]$
Least squares [24]	$\sum _k w_k [ f_k - p_k(\rho )]^2$ , w_k>0
Free least squares [4]	$\sum _k 1/p_k(\rho )[ f_k - p_k(\rho )]^2$
Hedged maximum likelihood [25]	$-\sum _k f_k \log [ p_k(\rho )]-\beta \log [\det (\rho )]$ , β>0

The presumably best-known and most widely employed method is called the maximum likelihood principle [23]. Given a set of measured frequencies, f_k, the maximum likelihood state, $\hat \rho_{\rm ml}$ , is exactly the one with the highest probability of generating these data. Other common fit functions, usually employed in photonic state reconstruction, are different variants of least squares [4, 24] that originate from the likelihood function using Gaussian approximations for the probabilities. There, this is often called maximum likelihood principle, but we distinguish these, indeed different, functions here explicitly. Typically, the weights in the least squares function are set to w_k = 1/f_k because f_k represents an estimate of the variance in a multinomial distribution, cf the free least squares principle. However, this leads to a strong bias if the count rates are extremal; for example, if one of the outcomes is never observed, this method naturally leads to difficulties. A method to circumvent this is given by the free least squares function [4] or using improved error analysis for rare events. Let us stress that all these principles have the property that if linear inversion delivers a valid estimate $\hat {\rho }_{{\rm lin}} \geqslant 0$ , it is also the estimate given by these reconstruction principles¹⁰.

Finally, hedged maximum likelihood [25] represents a method that circumvents low rank state estimates. Via this, one obtains more reasonable error bars using parametric bootstrapping methods [26]; for other error estimates, see the recently introduced confidence regions for quantum states [27, 28]. In principle, many more fit functions are possible, such as generic loss functions [29], but considering these is beyond the scope of this work.

3. Method

From the above, it is apparent that permutationally invariant state reconstruction requires the solution of

$\begin{equation} \hat \rho_{\rm PI} = \arg \: \min_{\rho_{\rm PI} \geqslant 0} F(\rho_{\rm PI}) \end{equation} \tag{ 5 }$

for the preferred fit function. This large-scale optimization becomes feasible along the following lines.

Firstly, one reduces the dimensionality of the underlying optimization problem because one cannot work with full density operators anymore. This requires an operational way to characterize permutationally invariant states ρ_PI ⩾ 0 over which the optimization is performed, and additionally, demands an efficient way to compute probabilities p_k(ρ_PI) which appear in the fit function. Secondly, one needs a method for performing the final optimization. We employ convex optimization for this.

3.1. Reduction of the dimensionality

This reduction relies on an efficient toolbox to handle permutationally invariant states, which exploits the particular symmetry. Here we will explain this method and the final structure; for more details see section 5.1. These techniques are well proven and established; we employ and adapt them here for the permutationally invariant tomography scheme such that we finally reach state reconstruction of larger qubits.

The methods of this toolbox are obtained via the concept of spin coupling that describes how individual, distinguishable spins couple to a combined system if they become indistinguishable. Since we deal with qubits we only need to focus on spin-1/2 particles. In the simplest case, two spin-1/2 particles can couple to a spin-1 system, called the triplet, if both spins are aligned symmetric, or to a spin-0 state, the singlet, if the spins are aligned anti-symmetric. Abstractly, this can be denoted as $\mathbbm {C}^2 \otimes \mathbbm {C}^2 = \mathbbm {C}^3 \oplus \mathbbm {C}^1$ . If one considers now three spins, then of course all spins can point in the same direction giving a total spin-3/2 system. There is also a spin-1/2 system possible if two particles form already a spin-0 and the remaining one stays untouched. This can be achieved, however, by more than one possibility, in fact by two inequivalent choices¹¹, and is expressed by $\mathbbm {C}^2 \otimes \mathbbm {C}^2 \otimes \mathbbm {C}^2 = \mathbbm {C}^4 \oplus (\mathbbm {C}^2 \otimes \mathbbm {C}^2)$ .

This scheme can be extended to N spin-1/2 particles to obtain the following decomposition of the total Hilbert space,

$\begin{equation} \mathcal{H}=(\mathbbm{C}^2)^{\otimes N} = \bigoplus_{j=j_{\rm min}}^{N/2} \mathcal{H}_j \otimes \mathcal{K}_j, \end{equation}\noindent \tag{ 6 }$

where the summation runs over different total spin numbers j = j_min,j_min + 1,...,N/2 starting from j_min∈{0,1/2}, depending on whether N is even or odd. Here, $\mathcal {H}_j$ are called the spin Hilbert spaces with dimensions $\dim (\mathcal {H}_j)=2j+1$ , while $\mathcal {K}_j$ are referred to as multiplicative spaces that account for the different possibilities to obtain a spin-j state. They are, generally, of a much larger dimension, cf equation (22).

Permutationally invariant states have a simpler form on this Hilbert space decomposition, namely

$\begin{equation} \rho_{\rm PI} = \bigoplus_{j=j_{\rm min}}^{N/2} p_j \rho_j \otimes \frac{\mathbbm{1}}{\dim(\mathcal{K}_j)}, \end{equation}\noindent \tag{ 7 }$

with density operators ρ_j called spin states and according probabilities p_j. Thus a permutationally invariant density operator only contains non-trivial parts on the spin Hilbert spaces while carrying no information on the multiplicative spaces. Note, further, that there are no coherences between different spin states. This means that any permutationally invariant state can be parsed into a block structure as schematically represented in figure 1. The main diagonal is made up of unnormalized spin states $\tilde \rho _j=p_j \rho _j / \dim (\mathcal {K}_j)$ , which appear several times, the number being equal to the dimension of the corresponding multiplicative space. This block-decomposition represents a natural way to treat permutationally invariant states and has, for example, already been employed in the aforementioned related works on permutationally invariant tomography [19–22] but also in other contexts [30–33].

**Figure 1.** Block decomposition for a generic permutationally invariant state as given by equation (7) with $\tilde \rho _j=p_j \rho _j /\dim(\mathcal {K}_j)$ . Note that we interchanged the spin- and multiplicative spaces.
Download figure:
Standard image

**Figure 1.** Block decomposition for a generic permutationally invariant state as given by equation (7) with $\tilde \rho _j=p_j \rho _j /\dim(\mathcal {K}_j)$ . Note that we interchanged the spin- and multiplicative spaces.
Download figure:
Standard image

This structure shows that if we work with permutationally invariant states, we do not need to consider the full density operator but rather that it is sufficient to deal only with this ensemble of spin states. Therefore we identify from now on

$\begin{equation} \rho_{\rm PI} \Longleftrightarrow p_j \rho_j,\:j=j_{\rm min},j_{\rm min}+1,\ldots, N/2. \end{equation} \tag{ 8 }$

This provides already an efficient way to store and to visualize such states. More importantly, it also provides an operational way to characterize valid states, since any permutationally invariant operator ρ_PI represents a true state if and only if all these spin operators ρ_j are density operators and p_j a probability distribution. This is in contrast to the generalized Bloch vector employed in the original proposal of permutationally invariant tomography [17] given by equation (52), which also provides efficient storage and processing of permutationally invariant states, but where Bloch vectors of physical states are not as straightforward to characterize.

Via this identification one can demonstrate once more the origin of the cubic scaling of the permutationally invariant tomography scheme. The largest spin state is of dimension N + 1 which requires parameters of the order of N² for characterization. Since one has of the order of N of these states, this results in a cubic scaling.

Fixing the ensemble of spin states as parameterization, it is now necessary to obtain an efficient procedure to compute the probabilities p^a_k(ρ_PI) for the optimized measurement scheme. This is achieved as follows: let us first stress that a similar block decomposition to that given by equation (7) holds for all permutationally invariant operators. Hence also the measurements M^a_k given by equation (2) can be cast into this form. Using the convention

$\begin{equation} M_k^a = \bigoplus_{j=j_{\rm min}}^{N/2} M_{k,j}^a \otimes \mathbbm{1} \end{equation} \tag{ 9 }$

leads to

$\begin{equation} {\mathrm{tr}}(\rho_{\rm PI} M_k^a) = \sum_{j=j_{\rm min}}^{N/2} p_j {\mathrm{tr}}(\rho_j M_{k,j}^a).\end{equation}\noindent \tag{ 10 }$

Therefore the problem is shifted to the computation of the spin-j operators M^a_k,j for each setting $\hat a$ . As we show in proposition 5.2 below, using the idea that measurements can be transformed into each other by a local operation U_a|i〉 = |i〉_a, this provides the relation

$\begin{equation} M_{k,j}^a= W^a_j M_{k,j}^{e_3} W^{a,\dagger}_j. \end{equation} \tag{ 11 }$

Here M^e₃_k,j corresponds to the measurement in the standard basis (that need to be computed once) and W^a_j is a unitary transformation determined by the rotation U_a. This connection is given by

$\begin{equation} U_a = \exp\left(-\mathrm{i} \sum_l t_l \sigma_l/2\right) \Longrightarrow W_j^a = \exp\left(-\mathrm{i} \sum_l t_l S_{l,j}\right), \end{equation}\noindent \tag{ 12 }$

where S_l,j stands for the spin operators in dimension 2j + 1. This finally provides an efficient way to compute probabilities.

3.2. Optimization

As a second step one still needs to cope up with the optimization itself. Although there are different numerical routines for statistical state reconstruction like maximum likelihood [34] or least squares [4, 35], we prefer nonlinear convex optimization [36] to obtain the final solution. Quantum state reconstruction problems are known to be convex [35, 37], but convex optimization has hardly been used for this task. However, convex optimization has several advantages: first of all it is a systematic approach that works for any convex fit function, including maximum likelihood and least squares. In contrast to other algorithms such as the fixed-point algorithm proposed in [34], it gives a precise stopping condition via an appropriate error control (see, however, [38]) and still exploits all the favourable, convex, structure in comparison with re-parameterization ideas as in [4]. Moreover, it is guaranteed to find the global optimum and the accuracy obtained is typically much higher than that obtained with other methods.

Quantum state reconstruction as defined via equation (4) can be formulated as a convex optimization problem as follows: all fit functions listed in table 1 are convex on the set of states. Via a linear parameterization of the density operator $\rho (x)= \mathbbm {1}/\dim (\mathcal {H}) + \sum x_i B_i$ , using an appropriate operator basis B_i such that normalization is fulfilled directly, the required optimization problem becomes

$\begin{eqnarray} \min_x & & F[\rho(x)] \nonumber\\ \\ \textrm{s.t.}& & \rho(x) = \frac{\mathbbm{1}}{\dim(\mathcal{H})}+ \sum_i x_i B_i \geqslant 0,\nonumber \end{eqnarray} \tag{ 13 }$

with a convex objective function F(x) = F[ρ(x)] and a linear matrix inequality as the constraint, i.e. precisely the structure of a nonlinear convex optimization problem [36]. For permutationally invariant states, one uses $\rho (x)=\oplus _j \bar \rho _j(x)$ with $\bar \rho _j=p_j \rho _j$ by using an appropriate block-diagonal operator basis B_i; therefore we continue this discussion with the more general form.

The optimization given by equation (13) can be performed, for instance, with the help of a barrier function [36]¹². Rather than considering the constrained problem, one solves the unconstrained convex task given by

$\begin{equation} \min_x F[\rho(x)] - t \log[\det\rho(x)], \end{equation} \tag{ 14 }$

where the constraint is now directly included in the objective function. This so-called barrier term acts precisely as its name suggests: if one tries to leave the strictly feasible set, i.e. all parameters x that satisfy ρ(x) > 0, one always reaches a point where at least one of the eigenvalues vanish. Since the barrier term is large within this neighbourhood, in fact singular at the crossing, it penalizes points close to the border and thus ensures that one searches for an optimum well inside the region where the constraint is satisfied. The penalty parameter t > 0 plays the role of a scaling factor. If it becomes very small the effect of the barrier term becomes negligible within the strictly feasible set and only remains at the border. Therefore a solution of equation (14) with a very small value of t provides an excellent approximation to the real solution. As shown in section 5.2 this statement can be made more precise by

$\begin{equation} F[\rho(x_{\rm sol}^t)] - F[\rho(x_{\rm sol})] \leqslant t \dim(\mathcal{H}), \end{equation}\noindent \tag{ 15 }$

which follows from convexity and which relates the true solution x_sol of the original problem given by equation (13) to the solution x^t_sol of the unconstrained problem with penalty parameter t. This condition represents the above-mentioned error control and serves as a stopping condition, i.e. as a quantitative error bound for a given t. Note that for permutationally invariant tomography $\dim (\mathcal {H})$ is not the dimension of the true N-qubit Hilbert space but instead the dimension of $\rho (x)=\oplus _j \bar \rho _j(x)$ , i.e. $\sum _{j=j_{\mathrm { min}}} (2j+1)=(N+1)(N+2j_{\rm min}+1)/4$ which increases only quadratically.

Small penalty parameters are approached by an iterative process: for a given starting point xⁿ_start and a certain value of the parameter t_n > 0, one solves equation (14). Its solution will be the starting point for xⁿ⁺¹_start = xⁿ_sol for the next unconstrained optimization with a lower penalty parameter t_n+1 < t_n. As the starting point for the first iteration, we employ t₀ = 1 and the point x⁰_start corresponding to the totally mixed state. This procedure is repeated until one has reached small enough penalty parameters. The penalty parameter is decreased step-wise. Then each unconstrained problem can be solved very efficiently since one starts already quite close to the true solution.

Let us point out that via the above-mentioned barrier method, one additionally obtains solutions to the hedged state reconstruction with β = t since the unconstrained problem given by equation (14) is precisely the fit function of the hedged version of table 1.

Finally, for comparison purposes, we would also like to mention the iterative fixed-point algorithm of [39]; for a modification see [40]. It is designed for maximum likelihood estimation and is straightforward to implement, since it only requires matrix multiplication; however, it has deficits regarding control and accuracy. For permutationally invariant tomography the algorithm can be adapted as follows: given a valid iterate ρⁿ_PI characterized by the ensemble of spin states $\bar \rho ^n_j=p_j^n\rho _j^n$ , one evaluates the probabilities p^a_k(ρⁿ_PI) using equation (10). Next, one computes the operators

$\begin{equation} R^n_j = \sum_{a,k} \frac{f_k^a}{p_k^a(\rho_{\rm PI}^n)} M_{k,j}^a, \end{equation}\noindent \tag{ 16 }$

which determine the next iterate $\bar \rho ^{n+1}_j= R^n_j \bar \rho _j^n R^{n\dagger }_j/\mathcal {N}$ with $\mathcal {N}=\sum _j {\mathrm {tr}}(R^n_j \bar \rho _j^n R^{n\dagger }_j)$ . This iteration is started, for example, from the totally mixed state and repeated until a sufficiently good solution is obtained.

4. Examples

The two methods from the previous section are employed in a prototype implementation under MATLAB, which already enables state reconstruction of about 20 qubits on a standard desktop computer.

The current algorithm is tested along the following lines: for a randomly generated permutationally invariant state ρ^true_PI we compute the true probabilities p^a_k,true for the chosen measurement settings. Rather than sampling we set the observed frequencies equal to this distribution, i.e. f^a_k = p^a_k,true. In this way linear inversion would return the original state; hence also every other reconstruction principle from table 1 has this state as the solution. We now start the algorithm and compare the trace distance between the analytic solution ρ^true_PI and the state after n iterations ρⁿ_PI. This distance $\frac {1}{2}{\mathrm {tr}}|\rho _{\mathrm { PI}}^{\rm true} - \rho _{\rm PI}^n|$ quantifies the probability with which the two states, the true analytic solution and the iterate after n steps in the algorithm, could be distinguished [41].

A typical representative of this process is depicted in figure 2 for 12 qubits using optimized settings. The randomly generated state ρ^true_PI was chosen to lie at the boundary of the state space since such rank-reduced solutions better resemble the case of state reconstruction of real data. More precisely, each spin state of the true density operator is given by a pure state ρ^true_j = |ψ_j〉〈ψ_j| chosen according to the Haar measure, while the p_j are selected by the symmetric Dirichlet distribution with concentration measure α = 1/2 [42]. As is apparent the algorithm behaves similarly for all three reconstruction principles and rapidly obtains a good solution after about 70 iterations. The steps in this plot are points where the penalty parameter is reduced by a factor of 10 starting from t = 1 and decreased down to t = 10⁻¹⁰. The slight rise after these points comes from the fact that we plot the trace distance and not the actual function (fit function plus penalty term) that is minimized.

Figure 3 shows a similar comparison for the maximum likelihood reconstruction of 20 qubits but now plotted versus algorithm time¹³. For comparison we include the performance of the iterative fixed-point algorithm, which requires much more iterations in general (3000 in this case versus about 90 for convex optimization). Let us emphasize that a similar behaviour between these two algorithms appears also for smaller qubit numbers. As one can see, convex optimization delivers a faster and in particular more accurate solution. In contrast, the iterative fixed-point algorithm shows a bad convergence rate although it initially starts off better. This was one of the main reasons for us to switch to convex optimization.

The current algorithm times of this test are listed in table 2 which are averaged over 50 randomly generated states. Thus already this prototype implementation enables state reconstruction of larger qubit numbers in moderate times. The small time difference between reconstruction principles is because least squares as a quadratic fit function provides some advantages in the implementation. More details of this difference are given in section 5.

Table 2. Current performance of the convex optimization algorithm on the described test procedure and on frequencies from simulated experiments; free least squares provides similar results to the maximum likelihood principle.

	N=8	N=12	N=16	N=20
Maximum likelihood
Algorithm test	8.5 s	47 s	$2.7 \min$	$9.2 \min$
Simulated experiment	9.2 s	48 s	$2.9 \min$	$9.3 \min$
Least squares
Algorithm test	8.4 s	39 s	$2.5 \min$	$6 \min$
Simulated experiment	9.2 s	43 s	$2.7 \min$	$6.7 \min$

Table 2 also contains the algorithm times for reconstructions using simulated frequencies f^a_k = n^a_k/N_r. For each setting they are deduced from the count rates n^a_k sampled from a multinomial distribution using the true event distribution p^a_k,true and N_r = 1000 repetitions. The true probabilities correspond to the same states as already employed in the algorithm test. From the table, one observes that state reconstruction for data with count statistics requires only slightly more time than the algorithm test with the correct probabilities. We attribute this to the fact that a few more iterations are typically required in order to achieve the desired accuracy.

Finally, let us perform the reconstruction of a simulated experiment of N = 14 qubits. Suppose that one intends to create a Dicke state |D_k,N〉 as given by equation (26) for some k and N, but that the preparation suffers from some imperfections such that at best one can prepare states of the form

$\begin{equation} \rho_{\rm dicke\: mix} = \sum_{l=0}^{N} {N \choose l} p^l (1-p)^{N-l} \vert{D_{l,N}}\rangle\langle{D_{l,N}}\vert, \end{equation}\noindent \tag{ 17 }$

where p = 0.6 characterizes some asymmetry. As the true state prepared in the experiment we now model some further imperfection in the form of an additional small misalignment U^⊗N, with $U=\exp (-\mathrm {i} \theta \sigma _y/2)$ , θ = 0.2, and some permutationally invariant noise σ_PI (chosen via the aforementioned method but using Hilbert–Schmidt instead of the Haar measure), i.e.

$\begin{equation} \rho_{\rm true}=0.6 \:U^{\otimes N} \rho_{\rm dicke\: mix} U^{\dagger \otimes N} + 0.4\: \sigma_{\rm PI}. \end{equation}\noindent \tag{ 18 }$

The frequencies are obtained via sampling from the state given by equation (18) using intentionally only N_r = 200 repetitions per setting (to see some differences). Finally, we reconstruct the state according to the maximum likelihood principle. Figure 4 shows the tomography bar plots of one of these examples for the largest spin state p_jρ_j, j = N/2 = 7 for both states. Although this state might be artificial, this example should highlight once more that this state reconstruction algorithm works also for realistic data and for qubit sizes where clearly any non-tailored state reconstruction scheme would fail. Moreover, it demonstrates that the spin ensemble p_jρ_j represents a very convenient graphical representation of such states (compared to a 2¹⁴ × 2¹⁴ matrix in this case).

**Figure 4.** The real part of the true and reconstructed (according to maximum likelihood) largest spin state ensemble p_jρ_j, j = N/2 = 7 using the optimal measurement setting. The basis is given by the Dicke basis |D_k,14〉, cf equation (26).
Download figure:
Standard image

5. Details

5.1. Reduction

Let us first give more details regarding the reduction step. This starts by recalling a group theoretic result summarized in the next section, which is then used to show how the stated simplifications with respect to states and measurements are obtained.

5.1.1. Background

Consider the following two unitary representations defined on the N qubit Hilbert space: the permutation or symmetric group V (p) which is defined by their action onto a standard tensor product basis by $V(p)\vert {i_1, \dots , i_N}\rangle =\vert {i_{p^{-1}(1)},\dots , i_{p^{-1}(N)}}\rangle$ according to the given permutation p, and the tensor product representation W(U) = U^⊗N of the special unitary group. A result known as the Schur–Weyl duality [43, 44] states that the action of these two groups is dual, which means that the total Hilbert space can be divided into blocks on which the two representations commute. More precisely, one has

$\begin{eqnarray} (\mathbbm{C}^2)^{\otimes N} &=& \bigoplus_{j=j_{\rm min}}^{N/2} \mathcal{H}_j \otimes \mathcal{K}_j, \end{eqnarray} \tag{ 19 }$

$\begin{eqnarray} V(p) &=& \bigoplus_{j=j_{\rm min}}^{N/2} \mathbbm{1} \otimes V_j(p), \end{eqnarray} \tag{ 20 }$

$\begin{eqnarray} W(U) &=& \bigoplus_{j=j_{\rm min}}^{N/2} W_j(U) \otimes \mathbbm{1}. \end{eqnarray} \tag{ 21 }$

Here V_j and W_j are the respective irreducible representations, and j_min∈{0,1/2} depending on whether N is even or odd. The dimensions of the appearing Hilbert spaces are $\dim (\mathcal {H}_j)=2j+1$ and

$\begin{equation} \dim(\mathcal{K}_j) = {N \choose N/2-j} - {N \choose N/2-j-1} \end{equation} \tag{ 22 }$

for all j < N/2 and $\dim (\mathcal {K}_{N/2})=1$ . Let us note that equation (20) already ensures the block-diagonal structure of permutationally invariant operators, while equation (21) becomes important for the measurement computation.

A basis of the Hilbert space $\mathcal {H}_j \otimes \mathcal {K}_j$ is formed by the spin states |j,m,α〉 = |j,m〉⊗|α_j〉 with $m=-j,-j+1,\dots,j$ and $\alpha _j=1,\dots ,\dim (\mathcal {K}_j)$ . These are obtained by starting with the states having the largest spin number m = j, which are given by

$\begin{eqnarray} \vert{j,j,1}\rangle&=&\vert{0}\rangle^{\otimes 2j} \otimes \vert{\psi^-}\rangle^{\otimes N/2-j}, \end{eqnarray} \tag{ 23 }$

$\begin{eqnarray} \vert{j,j,\alpha}\rangle&=&\sum_p c_{j,p} V(p) \vert{j,j,1}\rangle, \end{eqnarray} \tag{ 24 }$

for all α ⩾ 2. The coefficients c_j,p must ensure that the states |j,j,α〉 are orthogonal; otherwise their choice is completely free since the detailed structure of different α's is not important. The full basis is obtained by subsequently applying the ladder operator $J_-=\sum _{n=1}^N \sigma _{-;n}$ to decrease the spin number m. Here σ_−;n refers to the operator with σ₋ = (σ_x − iσ_y)/2 on the nth qubit and identity on the rest. Thus in total the basis becomes

$\begin{equation} \vert{j,m,\alpha}\rangle = \mathcal{N} J_-^{j-m} \vert{j,j,\alpha}\rangle, \end{equation} \tag{ 25 }$

with appropriate normalizations $\mathcal {N}$ . Note that the subspace corresponding to the highest spin number j = N/2 is also called the symmetric subspace, which contains many important states such as the Greenberger–Horne–Zeilinger or Dicke states, which using the spin states read as

$\begin{equation} \vert{\rm{GHZ}}\rangle=\frac{1}{\sqrt{2}}\! \left( \vert{0}\rangle^{\otimes N} + \vert{1}\rangle^{\otimes N}\right) \!=\!\frac{1}{\sqrt{2}}\! \left( \vert{N/2,N/2,1}\rangle + \vert{N/2,-N/2,1}\rangle \right)\!, \end{equation} \tag{ 26 }$

$\begin{equation} \vert{D_{k,N}}\rangle= \mathcal{N} \left[ \vert{1}\rangle^{\otimes k}\otimes \vert{0}\rangle^{\otimes N-k} \right]_{\rm PI}= \vert{N/2, N/2-k,1}\rangle. \end{equation} \tag{ 27 }$

5.1.2. Permutationally invariant states and measurement operators

Let us now employ this result in order to derive a generic form for permutationally invariant states; we give the proof for completeness.

(Permutationally invariant states).

Proposition 5.1 Any permutationally invariant state of N qubits ρ_PI defined via equation (1) can be written as

$\begin{equation} \rho_{\rm PI} = \bigoplus_{j=j_{\rm min}}^{N/2} p_j \rho_j \otimes \frac{\mathbbm{1}}{\dim(\mathcal{K}_j)}; \end{equation} \tag{ 28 }$

hence it is fully characterized by the ensemble p_jρ_j. Moreover, ρ_PI is a density operator if and only if all ρ_j are density operators and p_j a probability distribution.

Proof. The proposition follows using the representation V (p) given by equation (20) in the definition of the states, cf equation (1), and then applying Schur's lemma [43, 44]. This lemma states that any linear operator A from $\mathcal {K}_j$ to $\mathcal {K}_i$ which commutes with all elements p of the group V_i(p)A = AV_j(p) must either be zero if i and j label different irreducible representations or $A= c \mathbbm {1}$ if they are unitarily equivalent. Since $A_{\mathrm { PI}}=1/N! \sum _p V_i(p) A V_j(p)^{\dagger}$ fulfils this relation, one obtains

$\begin{equation} \frac{1}{N!} \sum_p V_i(p) A V_j(p)^{\dagger} = \delta_{ij} \: {\mathrm{tr}}(A) \frac{\mathbbm{1}}{\dim(\mathcal{K}_j)}. \end{equation} \tag{ 29 }$

The normalization can be checked taking the trace on both sides. Adding appropriate identities provides

$\begin{equation} \frac{1}{N!} \sum_p \mathbbm{1} \otimes V_i(p) B \mathbbm{1} \otimes V_j(p)^{\dagger} = \delta_{ij} \: {\mathrm{tr}}_{\mathcal{K}_j}( B ) \otimes \frac{\mathbbm{1}}{\dim(\mathcal{K}_j)}, \end{equation} \tag{ 30 }$

where B should now be a linear operator from $\mathcal {H}_j \otimes \mathcal {K}_j$ to $\mathcal {H}_i \otimes \mathcal {K}_i$ .

Finally, let P_j denote the projector onto $\mathcal {H}_j\otimes \mathcal {K}_j$ , and using equation (30) yields

$\begin{equation} \rho_{\rm PI} = \frac{1}{N!}\sum_p V(p) \rho V(p)^{\dagger} = \sum_{i,i^\prime,j,j^\prime} \frac{1}{N!}\sum_p P_i V(p)P_{i^\prime} \rho P_j V(p)^{\dagger} P_{j^\prime} \end{equation} \tag{ 31 }$

$\begin{equation} \hphantom{\rho_{\rm PI}} = \sum_{i,j} \left\{ \frac{1}{N!} \sum_p P_i[\mathbbm{1} \otimes V_i(p)] P_i \rho P_{j}[\mathbbm{1} \otimes V_{j}(p)]^{\dagger} P_{j}\right\} \end{equation} \tag{ 32 }$

$\begin{equation} \hphantom{\rho_{\rm PI}}= \sum_{i,j} P_i \left[ \delta_{ij} \: {\mathrm{tr}}_{\mathcal{K}_j}(P_i \rho P_{j}) \otimes \frac{\mathbbm{1}}{\dim(\mathcal{K}_j)} \right] P_j \end{equation} \tag{ 33 }$

$\begin{equation} \hphantom{\rho_{\rm PI}}= \bigoplus_j {\mathrm{tr}}_{\mathcal{K}_j}(P_j \rho P_j) \otimes \frac{\mathbbm{1}}{\dim(\mathcal{K}_j)}, \end{equation} \tag{ 34 }$

which provides the general structure.

The state characterization part follows because positivity of a block matrix is equivalent to positivity of each block. □

Next let us concentrate on the measurement part. Although the block decomposition follows already from the previous proposition, it is here more important to obtain an efficient computation of each measurement block for the selected setting.

(Measurement operators).

Proposition 5.2 The POVM elements M^a_k as defined in equation (2) for any local setting $\hat a \in \mathbbm {R}^3$ can be decomposed as $M_k^a=\bigoplus _j M_{k,j}^a \otimes \mathbbm {1}$ with

$\begin{equation} M_{k,j}^a=W_j(U_a) M^{e_3}_{k,j} W_j(U_a)^{\dagger}. \end{equation} \tag{ 35 }$

The unitary is given by $W_j(U_a)=\exp (-\mathrm {i} \sum _l t_l S_{l,j})$ using the spin operators S_l,j in dimension 2j + 1, while the coefficients t_l are determined by $U_a = \exp (-\mathrm {i} \sum _l t_l \sigma _l/2)$ which satisfies $\hat a \cdot \vec \sigma = U_a \sigma _z U_{a}^{\dagger}$ . For the measurement in the standard basis $\hat a=\hat {e}_3$ one obtains

$\begin{equation} M^{e_3}_{k,j}= \vert{j,N/2-k}\rangle\langle{j,N/2-k}\vert \end{equation} \tag{ 36 }$

if −j ⩽ N/2 − k ⩽ j and zero otherwise.

Proof. The basic idea is to consider the measurement in an arbitrary local basis $\hat a$ by a rotation followed by the collective measurement in the standard basis. The block decomposition is obtained as follows:

$\begin{eqnarray} &&M_k^a = U_a^{\otimes N} M_k^{e_3} U_a^{{\dagger} \otimes N} = W(U_a) \left[ \bigoplus_j M^{e_3}_{k,j} \otimes \mathbbm{1} \right] W(U_a)^{\dagger} \end{eqnarray} \tag{ 37 }$

$\begin{eqnarray} &&\hphantom{M_k^a}= \bigoplus_j W_j(U_a) M^{e_3}_{k,j} W_j(U_a)^{\dagger} \otimes \mathbbm{1}. \end{eqnarray} \tag{ 38 }$

The first step holds because U_a satisfies |i〉_a〈i| = U_a|i〉〈i|U^†_a, while the block decomposition of the standard basis measurement M^e₃_k is employed afterwards. In the last part one uses the tensor product representation given by equation (21).

Since one knows that W_j is irreducible it can be uniquely written in terms of its Lie algebra representation dW_j as W_j(U_a) = W_j(e^−iX) = e^−idW_j(X), which is given by the spin operators in this case, i.e. dW_j(σ_l/2) = S_l,j [45].

Thus we are left to compute the measurement blocks M^e₃_k,j for the standard basis. Note it is sufficient to evaluate M^e₃_k,j⊗|1_j〉〈1_j| such that one can employ the spin basis states |j,m,1〉 as introduced in section 5.1.1. At first, note that M^e₃_k exactly contains k projections onto |0〉, while each basis state |j,m,1〉 possesses (N/2 + m) zeros. Therefore one obtains M^e₃_k,j|j,m,1〉∝δ_k,N/2+m|j,m,1〉. Since each POVM has to resolve the identity this is only possible if each M^e₃_k,j is the stated rank-1 projector on the basis states. □

Finally one still needs to express $U_a=\exp (-\mathrm {i} \sum _l t_l \sigma _l /2)$ for the chosen setting $\hat a \in \mathbbm {R}^3$ . Since this can be related to a familiar rotation [45] these coefficients can be expressed as $t_l = (\theta \hat n)_l$ via a rotation about an angle θ around the axis $\hat n$ . Since this rotation should turn $\hat e_3$ into $\hat a$ , its parameters are given by

$\begin{eqnarray} \hat n &=& \frac{\hat e_3 \times \hat a}{\| \hat e_3 \times \hat a \|_2}, \end{eqnarray} \tag{ 39 }$

$\begin{eqnarray} \theta &=& \arccos(\hat e_3 \cdot \hat a), \end{eqnarray} \tag{ 40 }$

and $\hat n = \hat e_1$ (or any other orthogonal vector) if $\hat a = \pm \hat e_3$ .

5.2. Convex optimization

In this part, we collect some more details of the described convex optimization algorithm; for complete coverage see the book [36].

Each unconstrained optimization given by equation (14) is solved via a damped Newton algorithm. The minimization of $f(x)=F[\rho (x)] - t \log [\det \rho (x)]$ is obtained by an iterative process. In order to determine a search direction at a given iterate xⁿ, one minimizes the quadratic approximation

$\begin{equation} f(x^n+\Delta x) \approx f(x^n) + \nabla f(x^n)^T \Delta x + \textstyle\frac{1}{2} \Delta x^T \nabla^2 f(x^n) \Delta x. \end{equation} \tag{ 41 }$

This reduces to solving a linear set of equations called the Newton equation

$\begin{equation} \nabla^2 f(x^n) \Delta x_{\rm nt} = - \nabla f(x^n), \end{equation} \tag{ 42 }$

which determines the search direction Δx_nt. The step length s for the next iterate xⁿ⁺¹ = xⁿ + sΔx_nt is chosen by a backtracking line search. Here one picks the largest $s=\max _{k \in \mathbbm {N}}\beta ^k$ with β∈(0,1) such that the iterate stays feasible ρ(xⁿ⁺¹) > 0 and that the function value decreases sufficiently, i.e. f(xⁿ⁺¹) ⩽ f(xⁿ) + αs∇f(xⁿ)^TΔx_nt with α∈(0,0.5). The process is stopped if one has reached an appropriate solution, which can be identified by ∥∇f(xⁿ)∥₂ ⩽ . If the initial point x_start is already sufficiently close to the true solution then the whole algorithm converges quadratically, i.e. the precision gets doubled at each step.

At this point, let us give the gradient and Hessian of the appearing functions. For the barrier term $\psi (x) = - \log [ \det \rho (x)]$ restricted to the positive domain $\rho (x) = \mathbbm {1}/\dim (\mathcal {H})+\sum _i x_i B_i {>} 0$ , one obtains [36]

$\begin{eqnarray} \frac{\partial \psi(x)}{\partial x_i} &=& - {\mathrm{tr}}[ \rho(x)^{-1} B_i], \end{eqnarray} \tag{ 43 }$

$\begin{eqnarray} \frac{\partial^2 \psi(x)}{\partial x_j\partial x_i} &=& {\mathrm{tr}}[\rho(x)^{-1} B_j \rho(x)^{-1} B_i]. \end{eqnarray} \tag{ 44 }$

Equation (44) shows that the Hessian of the penalty term ∇²ψ(x) > 0 is positive definite, such that ψ(x) is indeed convex. The derivatives of the preferred fit function can be computed directly. For instance, using the likelihood function $F_{\mathrm { ml}}(x)= - \sum _k f_k \log p_k(x)$ with p_k(x) = tr[ρ(x)M_k], they read

$\begin{eqnarray} \frac{\partial F_{\rm ml}(x)}{\partial x_i} &=& - \sum_k \frac{f_k}{p_k(x)} {\mathrm{tr}}(B_i M_k), \end{eqnarray}\noindent \tag{ 45 }$

$\begin{eqnarray} \frac{\partial^2 F_{\rm ml}(x)}{\partial x_j\partial x_i} &=&\sum_k \frac{f_k}{p^2_k(x)} {\mathrm{tr}}(B_j M_k ) {\mathrm{tr}}(B_i M_k ). \end{eqnarray}\noindent \tag{ 46 }$

The bottleneck of such an algorithm is the actual computation of the second derivatives. Although the expansion coefficients of each measurement tr(B_jM_k) can be computed in advance, it is still necessary to compute equation (46) anew at each point x due to the dependence of p_k(x). With respect to that, the least squares fit function bears a great advantage since its Hessian is constant, i.e. $\partial _j\partial _i F_{\mathrm { ls}}(x) = 2 \sum _k w_k {\mathrm {tr}}(B_j M_k ) {\mathrm {tr}}(B_i M_k )$ , such that one saves time on this part.

Finally, let us comment on the optimality conditions, known as the Karush–Kuhn–Tucker conditions [36]. A given x^☆ is the global solution of the convex problem given by equation (13) if and only if¹⁴ there exists an additional Lagrange multiplier Z^☆ (as given in the equations beneath) such that the pair (x^☆,Λ^☆) satisfies

$\begin{eqnarray} &&\frac{\partial}{\partial x_i} F(x^\star)-{\mathrm{tr}}[\Lambda^\star B_i] = 0, \quad \forall i, \end{eqnarray}\noindent \tag{ 47 }$

$\begin{eqnarray} &&\Lambda^\star \geqslant 0, \quad \rho(x^\star) \geqslant 0, \end{eqnarray}\noindent \tag{ 48 }$

$\begin{eqnarray} &&{\mathrm{tr}}[\Lambda^\star \rho(x^\star)] = 0. \end{eqnarray}\noindent \tag{ 49 }$

Given the solution x^t_sol of the corresponding unconstrained problem with penalty parameter t, it follows from ∇f(x^t_sol) = 0 using equation (43) that the gradient conditions are satisfied with Λ_t = tρ(x^t_sol)⁻¹ being the Lagrange multiplier. This pair (x^t_sol,Λ_t) also satisfies equation (48); only the duality gap condition ${\mathrm {tr}}[\Lambda _t \rho (x^t_{\mathrm { sol}})] = t \dim (\mathcal {H}) > 0$ does not hold exactly. However, this quantity appears in the following inequality:

$\begin{eqnarray} &&F(x^t_{\rm sol}) - {\mathrm{tr}}[ \Lambda_t \rho(x^t_{\rm sol})] = \min_{x: \rho(x) \geqslant 0} F(x) - {\mathrm{tr}}[\Lambda_t \rho(x)] \end{eqnarray}\noindent \tag{ 50 }$

$\begin{eqnarray} &\leqslant& \min_{x: \rho(x) \geqslant 0} F(x) = F(x_{\rm sol}). \end{eqnarray}\noindent \tag{ 51 }$

Here one used that x^t_sol is the solution of F(x) − tr[Λ_tρ(x)] because the gradient vanishes (and the solution is not at the border), and tr[Λ_tρ(x)] ⩾ 0 for the inequality. This is the stated accuracy given by equation (15) which relates the function value of x^t_sol to the true solution x_sol.

5.3. Additional tools

5.3.1. Optimization of measurement settings

Measurement settings, each described by a unit vector $\hat {a}_i \in \mathbbm {R}^3$ as explained in section 2.1, are chosen to optimize a figure of merit characterizing how well a given permutationally invariant target state ρ_tar can be reconstructed. As such a quality measure we use the sum of errors for the tomographically complete operator set of all tensor products of Pauli operators. Note that a permutationally invariant state ρ_PI is already uniquely determined by its generalized Bloch vector [17] defined as

$\begin{equation} b_{klmn}={\mathrm{tr}}\left(\left[ \sigma_x^{\otimes k} \otimes \sigma_y^{\otimes l} \otimes \sigma_z^{\otimes m} \otimes \mathbbm{1}^{\otimes n} \right]_{\rm PI} \rho_{\rm PI}\right) \end{equation} \tag{ 52 }$

with natural numbers satisfying k + l + m + n = N. Consequently, the total error of all Bloch vector elements is given by

$\begin{equation} \mathcal{E}^2_{\rm total}(\hat a_i,\rho_{\rm tar})= \sum_{k,l,m,n} {N \choose k,l,m,n} \; \mathcal{E}^2_{b_{klmn}}(\hat a_i,\rho_{\rm tar}), \end{equation} \tag{ 53 }$

where the multinomial coefficient weights the error of each generalized Bloch vector by its number of appearance in a generic Pauli product decomposition.

The error of each Bloch vector element must now be related to the carried out measurements. For that, note that each Bloch vector element can be expressed as

$\begin{equation} b_{klmn} = \sum_i c_i^{klmn} {\mathrm{tr}}( [(\hat{a}_i \cdot \vec \sigma)^{\otimes N-n} \otimes \mathbbm{1}^{\otimes n}]_{\rm PI} \rho_{\rm tar}) \end{equation} \tag{ 54 }$

using appropriate coefficients c^klmn_i and the expectation values of $[(\hat {a}_i \cdot \vec \sigma )^{\otimes N-n} \otimes \mathbbm {1}^{\otimes n}]_{\mathrm { PI}}$ which can be computed from the coarse-grained measurement outcomes M^a_i_k of setting $\hat {a}_i$ as given by equation (2) using linear combinations. Assuming independent errors, one obtains

$\begin{eqnarray} &&\mathcal{E}^2_{b_{klmn}}(\hat a_i,\rho_{\rm tar}) = \sum_i c_i^{klmn} \mathcal{E}^2_{\rho_{\rm tar}} \left( [(\hat{a}_i \cdot \vec \sigma)^{\otimes N-n} \otimes \mathbbm{1}^{\otimes n}]_{\rm PI} \right). \end{eqnarray} \tag{ 55 }$

The detailed form of the error expression $\mathcal {E}^2_{\rho _{\mathrm { tar}}} ( [(\hat {a}_i \cdot \vec \sigma )^{\otimes N-n} \otimes \mathbbm {1}^{\otimes n}]_{\rm PI} )$ may depend on the actual physical realization, but we assume the following form:

$\begin{equation} \mathcal{E}^2_{\rho_{\rm tar}} \left( [(\hat{a}_i \cdot \vec \sigma)^{\otimes N-n} \otimes \mathbbm{1}^{\otimes n}]_{\rm PI} \right) =K \Delta_{\rho_{\rm tar}} \left( [(\hat{a}_i \cdot \vec \sigma)^{\otimes N-n} \otimes \mathbbm{1}^{\otimes n}]_{\rm PI} \right), \end{equation} \tag{ 56 }$

where Δ_ρ[A] = tr(ρA²) − [tr(ρA)]² is the standard variance and K is a state-independent factor. This form fits well, for example, to the common error model in photonic experiments where count rates are assumed to follow a Poissonian distribution. For more details of this derivation, see [17].

For large qubit numbers, N, this optimization is carried out iteratively. Starting from randomly chosen measurement directions or from vectors which are uniformly distributed according to some frame potential [46], one searches for updates according to

$\begin{equation} \hat{a}_i^\prime=\frac{p\hat{a}_i^n+(1-p)\hat{r}_i}{\| p\hat{a}_i^n+(1-p)\hat{r}_i \|}. \end{equation} \tag{ 57 }$

Here $\hat {a}_i^n$ is the current iterate, p < 1 a probability close to 1 and $\hat {r}_i$ are randomly chosen unit vectors. If this new set of directions $\hat {a}_i^\prime$ leads to a smaller total error $\mathcal {E}^2_{{\mathrm { total}}}(\hat {a}^\prime _i,\rho _{\rm tar})$ than the previous set, then these new measurement settings form the next iterate $\hat {a}^{n+1}_i=\hat {a}_i^n$ ; otherwise this procedure is carried out once more. This process is repeated until the total error does not decrease anymore. This way of optimizing the measurements requires a method for computing the total error $\mathcal {E}^2_{{\mathrm { total}}}(\hat {a}_i,\rho _{\rm tar})$ for a given set of measurements $\hat {a}_i$ . Using the generalized Bloch vector or the spin ensemble this computation can be carried out again efficiently for larger qubit numbers N.

Although this algorithm is not proven to attain the true global optimum it is still a straightforward technique to obtain good settings. In the end this is often sufficient, recalling that the true unknown state can deviate from the target state and that one considers 'just' an overall single figure of merit. For our simulations we always use the optimized settings for the totally mixed state; a reasonable guess if no extra information is available [47].

Regarding this point, we finally like to add that if one does not employ the minimal number of measurement settings, but rather an over-complete set, e.g. four times as many settings but four times fewer measurements per setting, then the procedure is quite insensitive to the chosen measurement directions. Hence, in many practical situations the search for optimal directions might not even be necessary and randomly chosen measurement directions suffice equally well.

5.3.2. Statistical pretest

Via the pretest one estimates the fidelity between the true ρ_true and the best permutationally invariant state $F_{\mathrm { PI}}(\rho _{\rm true})=\max _{\rho _{\rm PI} \geqslant 0} {\mathrm {tr}}(\sqrt {\sqrt {\rho _{\rm true}} \rho _{\rm PI}\sqrt {\rho _{\rm true}} })$ using only measurement results from a few settings $\hat a \in T$ , e.g. employing only $T=\{\hat e_1,\hat e_2,\hat e_3\}$ . Depending on this quantity, one decides on whether permutationally invariant tomography is worth continuing. As explained in detail in [17], this fidelity can be bounded by

$\begin{equation} F_{\rm PI}(\rho_{\rm true}) \geqslant [{\mathrm{tr}}(\rho_{\rm true} Z)]^2 \end{equation} \tag{ 58 }$

with an operator $Z=\sum z_k^a M_k^a$ being built up by the carried out measurements M^a_k given by equation (2) and satisfying Z ⩽ P_sym, where P_sym denotes the projector onto the symmetric subspace.

The expansion coefficients z^a_k should be optimized to attain the best lower bound. For a given target state ρ_tar this problem can be cast into a semidefinite program [36, 48] that can be solved efficiently using standard numerical routines. However, for larger qubit numbers one must again employ the block structure of the measurement operators as given by equation (9) to handle the operator inequality. Note that the projector on the symmetric subspace has a Block structure $P_{{\mathrm { sym}},j}=\delta _{j,N/2} \mathbbm {1}$ . Then the final problem reads as

$\begin{eqnarray} \max_z & & \sum_{a \in T,k} z_k^a{\mathrm{tr}}(\rho_{\rm tar} M_k^a) \nonumber\\ \\ \textrm{s.t.}& & \sum_{a \in T,k} z_k^a M_{k,j=N/2}^a \leqslant \mathbbm{1},\quad \sum_{a \in T,k} z_k^a M_{k,j}^a \leqslant 0,\quad \forall j < N/2.\nonumber \end{eqnarray} \tag{ 59 }$

If one experimentally implements this pretest, one must account for additional statistical fluctuations. For the chosen Z, one can employ the sample mean $\bar Z = \sum z_k^a f_k^a$ using the observed frequencies f^a_k = n^a_k/N_R in N_R repetitions of setting $\hat a$ , as an estimate of the true expectation value tr(ρ_trueZ). This sample mean $\bar Z$ will fluctuate around the true mean but large deviations will become less likely, such that for an appropriately chosen the quantity $\mathrm {sign}(\bar Z -\epsilon ) (\bar Z - \epsilon )^2$ is a lower bound to the true fidelity at the desired confidence level. The proof essentially uses the techniques employed in [49, 50].

(Statistical pretest).

Proposition 5.3 For any $Z=\sum z_k^a M_k^a \leqslant P_{\mathrm { sym}}$ let $\bar Z = \sum z_k^a n_k^a /N_{\mathrm { R}}$ denote the sample mean using N_R repetitions for setting $\hat a \in T$ . If the data are generated by the state ρ_true, then

$\begin{equation} \mathrm{Prob} \left[ F_{\rm PI}(\rho_{\rm true}) \geqslant \mathrm{sign}(\bar Z -\epsilon) (\bar Z - \epsilon) ^2 \right] \geqslant 1-\exp(-2 N_{\rm R} \epsilon^2 / C_z^2) \end{equation} \tag{ 60 }$

with $C_z^2 = \sum _a \left ( z_{\mathrm { max}}^s - z_{\rm min}^s \right )^2$ where z^a_max/min are the respective optima for setting $\hat a$ over all outcomes k.

Proof. The given statement follows along

$\begin{eqnarray} &&\fl \mathrm{Prob} \left[ F_{\rm PI}(\rho_{\rm true}) \geqslant \mathrm{sign}(\bar Z -\epsilon) (\bar Z - \epsilon) ^2 \right] \geqslant \mathrm{Prob} \left[ {\mathrm{tr}}(\rho_{\rm true} Z) \geqslant \bar Z -\epsilon \right], \end{eqnarray}\noindent \tag{ 61 }$

$\begin{eqnarray} &&\fl \hspace{2pt}\hphantom{\mathrm{Prob}[F_{\rm PI}(\rho_{\rm true})}\geqslant 1 - \mathrm{Prob} \left[ {\mathrm{tr}}(\rho_{\rm true} Z) \leqslant \bar Z -\epsilon \right] \geqslant 1 - \exp(-2 N_{\rm R} \epsilon^2 / C_z^2). \end{eqnarray}\noindent \tag{ 62 }$

Here the first inequality holds because the set of outcomes satisfying $\{ n_k^a: {\mathrm {tr}}(\rho _{\mathrm { true}} Z) \geqslant \bar Z -\epsilon \}$ is a subset of $\{ n_k^a: F_{\mathrm { PI}}(\rho _{\rm true}) \geqslant \mathrm {sign}(\bar Z -\epsilon ) (\bar Z - \epsilon ) ^2\}$ using equation (58). In the last inequality we use the Hoeffdings tail inequality [51] to bound $\mathrm {Prob} [ {\mathrm {tr}}(\rho _{\mathrm { true}} Z) \leqslant \bar Z -\epsilon ]$ . □

Note that this pretest can also be applied after the whole tomography scheme in which case the projector $P_{\mathrm { sym}}=\sum z_k^a M_k^a$ becomes accessible. Moreover, let us point out that a strong statistical significance, or a low respectively, might not be achieved with the best expectation value as given by equation (59) [52]; hence, optimizing Z for a rather mixed state is often better.

Finally, let us remark that the pretest can be improved by additional projectors, see the supplementary material of [17]. This leads to the bound $F_{\mathrm { PI}}(\rho _{\rm true}) \geqslant \sum _j p_j^2$ with p_j being the weight of the corresponding spin-j state of the permutationally invariant part of ρ_true as given in equation (7). From this expression one sees that this test only works well for states having a rather large weight on one of these spin states. Others, like the totally mixed state, while clearly being permutationally invariant, are not identified as states close to the permutationally invariant subspace. This is in stark contrast to compressed sensing where the certificate succeeds for the whole class of low-rank target states [10].

5.3.3. Entanglement and the MaxLik–MaxEnt principle

Following the last comment from the previous subsection, we want to argue that even in the case of an inefficient certificate, permutationally invariant state reconstruction as given by equation (5) is meaningful. Firstly we would like to emphasize that the permutationally invariant part of any density operator corresponds to a good representative to investigate the entanglement properties of the true, unknown state. This is because the transformation given by equation (1) can be achieved by permutations or multiple swap operations together with classical mixing, whereby entanglement cannot be created [53]. Thus if the permutationally invariant part of the density operator is entangled this holds true also for the real state. In addition, any symmetric entanglement measure, i.e. all commonly known ones, can only decrease under this operation, thereby even quantification is faithful [54].

Secondly, permutationally invariant state reconstruction also represents the solution of the maximum-likelihood maximum-entropy principle as introduced in [55], which goes as follows: if the carried out measurements are not tomographically complete, then there is, in general, not a single state $\hat \rho _{\mathrm { ml}}$ that maximizes the likelihood but rather a complete set of them. In order to single out a 'good' representative, the authors of [55] propose to choose the state that has the largest entropy, which, according to the Jaynes principle [47], is the special state for which one has the fewest information.

(Permutationally invariant MaxLik–MaxEnt principle).

Proposition 5.4 Using the described permutationally invariant tomography scheme, the reconstructed permutationally invariant state given by equation (5) (with the likelihood function) is also the solution of the maximum-likelihood maximum-entropy principle.

Proof. Since the measurements given by equation (2) are invariant and tomographically complete for permutationally invariant states, all density operators with the same spin ensemble as $\hat \rho _{\mathrm { PI}}$ have the same maximum likelihood. According to Ref. [47], the state with maximal entropy and consistent with a given set of expectation values for operators M^a_k has the form $\rho \propto \exp (\sum _{a,k} \lambda ^a_k M^a_k)$ . The Lagrange multipliers $\lambda ^a_k \in \mathbbm {R}$ must be chosen such that the given expectation values match. However, because all M^a_k are permutationally invariant we can employ the block decomposition given by equation (9) and finally obtain $\exp (\sum _{a,k} \lambda ^a_k M^a_k) = \exp (\oplus _j \sum _{a,k} \lambda ^a_k M^a_{k,j}\otimes \mathbbm {1}) = \oplus _j \exp (\sum ^a_k \lambda ^a_k M^a_{k,j}) \otimes \mathbbm {1}$ . Hence we obtain the same structure as $\hat \rho _{\mathrm { PI}}$ , which therefore is also the state with maximum entropy. □

6. Conclusion and outline

In this paper, we have provided all the necessary ingredients to carry out permutationally invariant tomography [17] in experiments with large qubit numbers. This includes, besides scheme-specific tasks such as the statistical pretest and the optimization of the measurement settings, in particular the state reconstruction part. Accounting for statistical fluctuations due to a finite number of data, this reconstruction demands the solution of a nonlinear large-scale optimization problem. We achieve this, firstly, by using a convenient toolbox to store, characterize and process permutationally invariant states, which largely reduces the dimension of the underlying problem and, secondly, by using convex optimization, which is superior compared to commonly used numerical routines in many respects. This makes permutationally invariant tomography a complete tomography method requiring only moderate measurement and data analysis effort.

There are many questions one may pursue in this direction: firstly, let us stress that the current prototype implementation is still not optimal. As explained, the bottleneck is the computation of the second derivatives; hence, we strongly believe that Hessian free optimization, like quasi-Newton or conjugate gradients [56], or the use of other barrier functions more tailored to linear matrix inequalities [57] are likely to push the reconstruction limit further. Secondly, it is natural to try to exploit other symmetries in the development of 'symmetry' tomography protocols; that is, tomography should work for all states that remain invariant under the action of a specific group. Clearly, any symmetry decreases the number of state-dependent parameters, but the challenge is to devise efficient local measurement strategies. Interesting classes here are graph-diagonal [58] or, more general, locally maximally entangleable states [59], translation or shift-invariant states [60], or U^⊗N invariant states [61, 62]. Thirdly, it is worth investigating to what extent particularly designed state tomography protocols are useful for further tasks, such as process tomography for quantum gates. For instance, permutationally invariant tomography might be unable to resolve the Toffoli gate [63] directly, but since the operation on all N target qubits is symmetric, a permutationally invariant resolution of this subspace (and the additional control qubit) might be sufficient. Finally, let us point out that permutationally invariant tomography can be further restricted to the symmetric subspace, which often contains the desired states. This is reasonable since we have seen that the pretest is only good if the unknown state has a large weight in one of the spin states. However, since the symmetric subspace grows only linearly with the number of particles, this tomography scheme can analyse many more qubits efficiently.

Acknowledgments

We thank M Kleinmann, B Kraus, T Monz, L Novo, P Schindler and J Wehr for stimulating discussions on the topic and technicalities. This work was supported by the FWF (START prize Y376-N16), the EU (Project Q-ESSENCE and Marie Curie CIG 293993/ENFOQI), the BMBF (Chist-Era Project QUASAR), the ERC (Starting Grant GEDENTQOPT), the QCCC of the Elite Network of Bavaria, the Spanish MICINN (project no. FIS2009-12773-C02-02), the Basque Government (project no. IT4720-10) and the National Research Fund of Hungary OTKA (contract no. K83858).

Permutationally invariant state reconstruction

Article metrics

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction

2. Background

2.1. Permutationally invariant tomography

2.2. Statistical state reconstruction

3. Method

3.1. Reduction of the dimensionality

3.2. Optimization

4. Examples

5. Details

5.1. Reduction

5.1.1. Background

5.1.2. Permutationally invariant states and measurement operators

5.2. Convex optimization

5.3. Additional tools

5.3.1. Optimization of measurement settings

5.3.2. Statistical pretest

5.3.3. Entanglement and the MaxLik–MaxEnt principle

6. Conclusion and outline

Acknowledgments

Footnotes

Permutationally invariant state reconstruction

Article metrics

Share this article

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction

2. Background

2.1. Permutationally invariant tomography

2.2. Statistical state reconstruction

3. Method

3.1. Reduction of the dimensionality

3.2. Optimization

4. Examples

5. Details

5.1. Reduction

5.1.1. Background

5.1.2. Permutationally invariant states and measurement operators

5.2. Convex optimization

5.3. Additional tools

5.3.1. Optimization of measurement settings

5.3.2. Statistical pretest

5.3.3. Entanglement and the MaxLik–MaxEnt principle

6. Conclusion and outline

Acknowledgments

Footnotes