Practical adaptive quantum tomography*

Christopher Granade; Christopher Ferrie; Steven T Flammia

doi:10.1088/1367-2630/aa8fe6

Quantum information processing (QIP) promises advantages in a wide range of different contexts, including machine learning [2–4], chemistry simulation [5–7], and number theory [8, 9]. As such, the experimental effort to build useful QIP devices has exploded in recent years. In the course of this effort, quantum tomography is a valuable tool for diagnosing and debugging small quantum devices, and has subsequently seen a variety of different advances. In particular, Bayesian approaches to tomography which are especially well suited to utilizing prior information and adapting to changing experimental conditions have developed significantly in recent years [10–13], presenting a useful experimental tool [14–16].

In this paper we demonstrate the efficiency and accuracy of an adaptive tomography protocol that we call PAQT: practical adaptive quantum tomography. PAQT intelligently selects new measurements based on the outcomes of previous ones [10, 17–21]. Adaptivity has been experimentally demonstrated [14, 15, 22–25], but is not currently standard practice. Though adaptivity increases accuracy, the computational costs incurred outweigh that of simply repeating standard measurements many times. The PAQT approach employs a simple heuristic that can be efficiently computed between measurements, even with embedded hardware [26–29]. The algorithm we propose is therefore compatible with modern experimental design and avoids an important limitation of previous approaches.

We base our algorithm off of self-guided quantum tomography (SGQT), which treats adaptive tomography as a direct optimization problem rather than a new optimization problem between each measurement [30]. Though this affords an efficient and easy to implement adaptive heuristic, SGQT is not without its limitations. It requires assuming that the target state is pure, and it does not return rich region estimates for a state. What PAQT achieves is to effectively combine SGQT with conventional and easily-implemented tomographic estimators, such as the Bayesian particle filter or least-squares fit (LSF) estimators. Under this approach, an experimentalist can collect data using SGQT (even if its assumptions are not met), and then post-process this data using particle filtering or LSF.

The benefit of PAQT is two-fold. (1) From the point of view of traditional tomography, it gives an adaptive tomography protocol requiring only modest computational resources, as the bulk of the computational cost is offloaded to post-processing. (2) From the point of view of simulation-based optimization tomography (such as SGQT), it effectively augments the output with region estimation providing a statistically robust quantification of uncertainty. Thus, while we do not explicitly demonstrate that the improved scaling of Ferrie [30] remains in the more general case considered here, PAQT does provide a practical and efficient procedure for performing adaptive quantum tomography with rigorous statistical principles.

The outline of the paper is as follows. In section 1 we define and review the problem of tomography as well as three standard solutions: least squares, maximum likelihood and Bayesian mean estimation. In section 2, we review the approaches to measurement adaptive tomography including the recently introduced self-guided technique. In section 3, introduce PAQT by combining SGQT with adaptive Bayesian tomography and detail the results of our numerical experiments. Section 4 concludes with a discussion.

1. The tomographic problem

In quantum state tomography, we are interested in reconstructing a quantum state from a collection of informationally complete measurements made on that state [31–33]. That is, a set of measurements is chosen such that if one learns their frequencies given a quantum system of interest, the frequencies for any other measurement of that system can then be predicted. If the system of interest is a qubit, for instance, then knowing the expectations of the observables $\{{\sigma }_{x},{\sigma }_{y},{\sigma }_{z}\}$ allows for predicting the distribution over outcomes for any other measurement. The empirical reconstruction of quantum states from measurements of informationally complete observables has been reviewed by D'Ariano et al [34], and reviewed in the case of continuous variables by Lvovsky and Raymer [35]. Here, we will focus on the case of state tomography in finite-dimensional systems.

That a quantum state can be empirically determined in principle, however, leaves the question of how to estimate a state in practice, given finite experimental resources. For instance, given data from an informationally complete set of observables, one could use a linear reconstruction, a maximum likelihood estimator [36–38], or a Bayesian mean estimator [10–13, 39] to report a state. We will detail each such approach below, and describe their relative strengths and weaknesses.

Before proceeding, we note that though we consider the general case of tomography in this work, substantial progress has been made by considering considering important special cases under which a state can be much more easily characterized. In particular, permutationally invariant tomography reconstructs the part of a multiqubit density matrix which is invariant under exchange of the qubits [40]. Compressed sensing allows for the efficient recovery of low-rank quantum states [41, 42], and has been applied experimentally in systems as large as six qubits [43]. Similarly, MPS [44] and PEPS [45] tomography use the MPS and PEPS ansatzes to improve exponentially on naïve methods for states that are well-approximated by common tensor network ansatzes [46]. Though we do not explore the possibility in this work, we expect that heuristic approaches should also offer similar advantages to tomographic estimation in these cases.

1.1. Problem set up

First, consider an orthonormal basis for traceless Hermitian operators ${\{{B}_{j}\}}_{j=1}^{{d}^{2}-1}$ —the Pauli basis, for example. That is, for all $i,j$ , ${B}_{j}^{\dagger }={B}_{j}$ and $\mathrm{Tr}({B}_{k}{B}_{j})={\delta }_{{kj}}$ and $\mathrm{Tr}({B}_{j})=0$ . Then, any state ρ can be written

$\begin{eqnarray}&&\rho =\displaystyle \frac{{\mathbb{1}}}{d}+\displaystyle \sum _{j=1}^{{d}^{2}-1}{\theta }_{j}{B}_{j},\end{eqnarray} \tag{ 1 }$

for some vector of parameters ${({\boldsymbol{\theta }})}_{j}={\theta }_{j}$ . Importantly, these parameters are constrained since $\rho \geqslant 0$ . This poses a problem for many approaches, but there are well-motivated methods which produce a valid quantum state starting from a non-physical matrix [47].

Let us assume two-outcome test measurements are made such that each measurement outcome is either 1 or 0 and represented by the pair $\{{P}_{k},{\mathbb{1}}-{P}_{k}\}$ . The Born rule dictates that the probability to get 1, say, is $\Pr (1| \rho ,{P}_{k})=\mathrm{Tr}(\rho {P}_{k})$ . Since the operators $\{{B}_{j}\}$ form a basis, we can write

$\begin{eqnarray}&&{P}_{k}=\displaystyle \frac{{\mathbb{1}}}{d}+\displaystyle \sum _{j=1}^{{d}^{2}-1}{p}_{{kj}}{B}_{j},\end{eqnarray} \tag{ 2 }$

and the Born rule vectorizes to

$\begin{eqnarray}&&\Pr (1| \rho ,{P}_{k})=\mathrm{Tr}(\rho {P}_{k})=\displaystyle \frac{1}{d}+{{\boldsymbol{p}}}_{k}^{{\rm{T}}}{\boldsymbol{\theta }},\end{eqnarray} \tag{ 3 }$

where ${({{\boldsymbol{p}}}_{k})}_{j}={p}_{{kj}}$ . Denote ${f}_{k}=\Pr (1| \rho ,{P}_{k})$ and ${({\boldsymbol{f}})}_{k}={f}_{k}$ . Also define the matrix ${\boldsymbol{X}}$ with entries ${({\boldsymbol{X}})}_{{kj}}={p}_{{kj}}$ . Then the above condenses to

$\begin{eqnarray}&&{\boldsymbol{f}}=\displaystyle \frac{1}{d}+{\boldsymbol{X}}{\boldsymbol{\theta }}.\end{eqnarray} \tag{ 4 }$

If we perform at least d² such measurements such that the set ${\{{P}_{k}\}}_{k=1}^{{d}^{2}}$ is linearly independent, then the probabilities ${\boldsymbol{f}}$ are sufficient to determine ρ uniquely. That is, the linear system in (4) has a solution set with a single valid quantum state. In practice we do not have access to ${\boldsymbol{f}}$ , but only samples drawn from the distribution that it defines. Suppose N_k measurements of $\{{P}_{k},{\mathbb{1}}-{P}_{k}\}$ yielded n_k 1s and N_k − n_k 0s. Then, the empirical frequencies are

$\begin{eqnarray}&&{\hat{{\boldsymbol{f}}}}_{k}=\displaystyle \frac{{n}_{k}}{{N}_{k}}.\end{eqnarray} \tag{ 5 }$

The task of tomography is to assign a quantum state ${\boldsymbol{\theta }}$ to each data set $\hat{{\boldsymbol{f}}}$ .

1.2. Linear inversion tomography

Next, we will outline the traditional approach to solving the tomography problem. While we do not recommend this approach, it usually provides reasonable answers and is at least implicitly the starting point for more sophisticated approaches.

We begin by setting the empirical frequencies equal to the (rescaled) theoretical probabilities $\hat{{\boldsymbol{f}}}={\boldsymbol{f}}$ . After all, $\mathrm{Tr}(\rho {P}_{k})$ is literally the expectation value of the observable P_k. In any case, if we let ${\boldsymbol{Y}}=\hat{{\boldsymbol{f}}}-1/d$ , the new system of equations

$\begin{eqnarray}&&{\boldsymbol{Y}}={\boldsymbol{X}}{\boldsymbol{\theta }},\end{eqnarray} \tag{ 6 }$

may not have a solution if more than d² different measurements have been made. The traditional approach is to use the least squares estimator

$\begin{eqnarray}&&{\hat{{\boldsymbol{\theta }}}}_{\mathrm{LS}}=\mathop{\mathrm{argmin}}\limits_{{\boldsymbol{\theta }}}\parallel {\boldsymbol{Y}}-{\boldsymbol{X}}{\boldsymbol{\theta }}{\parallel }_{2}^{2},\end{eqnarray} \tag{ 7 }$

which has the exact solution

$\begin{eqnarray}&&{\hat{{\boldsymbol{\theta }}}}_{\mathrm{LS}}={({{\boldsymbol{X}}}^{{\rm{T}}}{\boldsymbol{X}})}^{-1}{{\boldsymbol{X}}}^{{\rm{T}}}{\boldsymbol{Y}}.\end{eqnarray} \tag{ 8 }$

This solution is not guaranteed to produce a positive semidefinite estimate. One can resort to performing constrained least squares (which is 'not that hard' since one probably has access to a black box implementation of this using a canned scientific software library) or one can use a two-step approach [47] that outputs the 'closest' physical state to a given matrix. There is no consensus on which should be preferred and we make no recommendations here. In our simulations, we have set all negative eigenvalues to zero, as we observe that in practice, measurements designed by self-guided tomography tends to only rarely yield ${\hat{{\boldsymbol{\theta }}}}_{\mathrm{LS}}$ corresponding to $\rho {\not\geqslant }0$ .

1.3. Maximum likelihood tomography

The linear least squares approach is folklore as old as the problem of tomography, but has been stated explicitly by Qi et al [48]. It usually arises when using a Gaussian approximation to the likelihood function in maximum likelihood estimation (MLE) (see, for example, Kaznady and James [49]). The likelihood function is the probability distribution of the data given a state ${\boldsymbol{\theta }}$ , thought of as a function of ${\boldsymbol{\theta }}$ . Since each measurement is an independent binomial trial, the likelihood function is quite simple:

$\begin{eqnarray}&&\Pr (\hat{{\boldsymbol{f}}}| {\boldsymbol{\theta }},{\boldsymbol{X}})=\displaystyle \prod _{k}\left(\displaystyle \genfrac{}{}{0em}{}{{N}_{k}}{{N}_{k}{\hat{f}}_{k}}\right){\left(\displaystyle \frac{1}{d}+{{\boldsymbol{p}}}_{k}^{{\rm{T}}}{\boldsymbol{\theta }}\right)}^{{N}_{k}{\hat{f}}_{k}}{\left(1-\displaystyle \frac{1}{d}-{{\boldsymbol{p}}}_{k}^{{\rm{T}}}{\boldsymbol{\theta }}\right)}^{{N}_{k}(1-{\hat{f}}_{k})}.\end{eqnarray} \tag{ 9 }$

One of the oldest techniques in classical statistical estimation is MLE, which prescribes the estimate

$\begin{eqnarray}&&{\hat{{\boldsymbol{\theta }}}}_{\mathrm{MLE}}=\mathop{\mathrm{argmin}}\limits_{{\boldsymbol{\theta }}}\Pr (\hat{{\boldsymbol{f}}}| {\boldsymbol{\theta }},{\boldsymbol{X}}).\end{eqnarray} \tag{ 10 }$

This does not have a closed form in general. To make some traction, we can approximate the likelihood function by a Gaussian (perhaps with appeal to the central limit theorem). A Gaussian is defined by its mean and variance, so we need only those from the actual distribution to make the approximation. These are simple enough to derive from the properties of the binomial distribution:

$\begin{eqnarray}&&{\mathbb{E}}[\hat{{\boldsymbol{f}}}]=\displaystyle \frac{1}{d}+{\boldsymbol{X}}{\boldsymbol{\theta }},\end{eqnarray} \tag{ 11 }$

$\begin{eqnarray}&&{\mathbb{V}}{[\hat{{\boldsymbol{f}}}]}_{{kj}}={\delta }_{{kj}}\displaystyle \frac{\left(\tfrac{1}{d}+{{\boldsymbol{p}}}_{k}^{{\rm{T}}}{\boldsymbol{\theta }}\right)\left(1-\tfrac{1}{d}-{{\boldsymbol{p}}}_{k}^{{\rm{T}}}{\boldsymbol{\theta }}\right)}{{N}_{k}}.\end{eqnarray} \tag{ 12 }$

The location of the maximum of a function is the same as that of the log of the function. The logarithm of the Gaussian approximation to the likelihood function (ignoring terms which do not depend on ${\boldsymbol{\theta }}$ ) is

$\begin{eqnarray}&&-\displaystyle \frac{1}{2}\displaystyle \sum _{k}\displaystyle \frac{{({Y}_{k}-{{\boldsymbol{p}}}_{k}^{{\rm{T}}}{\boldsymbol{\theta }})}^{2}{N}_{k}}{\left(\tfrac{1}{d}+{{\boldsymbol{p}}}_{k}^{{\rm{T}}}{\boldsymbol{\theta }}\right)\left(1-\tfrac{1}{d}-{{\boldsymbol{p}}}_{k}^{{\rm{T}}}{\boldsymbol{\theta }}\right)}.\end{eqnarray} \tag{ 13 }$

We make one more approximation, which is again replacing the probabilities with their empirical frequencies¹ such that the maximum likelihood problem then becomes

$\begin{eqnarray}&&{\hat{{\boldsymbol{\theta }}}}_{\mathrm{MLE}}=\mathop{\mathrm{argmin}}\limits_{{\boldsymbol{\theta }}}\parallel {\boldsymbol{Y}}^{\prime} -{\boldsymbol{X}}^{\prime} {\boldsymbol{\theta }}{\parallel }_{2}^{2},\end{eqnarray} \tag{ 14 }$

where we have weighted ${\boldsymbol{Y}}$ and ${\boldsymbol{X}}$ by the variance:

$\begin{eqnarray}&&{Y}_{k}^{\prime} =\sqrt{\displaystyle \frac{{N}_{k}}{{\hat{f}}_{k}(1-{\hat{f}}_{k})}}{Y}_{k},\ {X}_{k}^{\prime} =\sqrt{\displaystyle \frac{{N}_{k}}{{\hat{f}}_{k}(1-{\hat{f}}_{k})}}{X}_{k}.\end{eqnarray} \tag{ 15 }$

Notably, this approach fails if ${\hat{f}}_{k}=0$ or 1 for any k, as the variance in these cases approaches zero, so that $Y{{\prime} }_{k}\to \infty$ . To solve this, we hedge the empirical frequencies by $\beta =0.5$ , so that we use ${\hat{f}}_{k}=({n}_{k}+0.5)/({N}_{k}+1)$ when computing the MLE [50].

1.4. Bayesian tomography

As opposed to the frequentist techniques noted above, the Bayesian approach centers around Bayes' rule, which prescribes how to update a prior distribution $\Pr ({\boldsymbol{\theta }})$ to a posterior distribution $\Pr ({\boldsymbol{\theta }}| \hat{{\boldsymbol{f}}},{\boldsymbol{X}})$ that is conditioned on the observed frequencies $\hat{{\boldsymbol{f}}}$ . Concretely,

$\begin{eqnarray}&&\Pr ({\boldsymbol{\theta }}| \hat{{\boldsymbol{f}}},{\boldsymbol{X}})=\displaystyle \frac{\Pr (\hat{{\boldsymbol{f}}}| {\boldsymbol{\theta }},{\boldsymbol{X}})\Pr ({\boldsymbol{\theta }})}{\Pr (\hat{{\boldsymbol{f}}}| {\boldsymbol{X}})},\end{eqnarray} \tag{ 16 }$

where $\Pr (\hat{{\boldsymbol{f}}}| {\boldsymbol{\theta }},{\boldsymbol{X}})$ is the likelihood function of (9), and where $\Pr (\hat{{\boldsymbol{f}}}| {\boldsymbol{X}})$ is a pesky normalization that we will deal with implicitly when doing numerical calculation. When Bayes' rule is used iteratively, the posterior for one experiment becomes the prior for the next. In words, this equation is a prescription of the full distribution of knowledge about the quantum state given the data that was actually observed. What can we do with this?

First, we can produce a single 'point' estimate of ${\boldsymbol{\theta }}$ via the posterior mean:

$\begin{eqnarray}&&{\hat{{\boldsymbol{\theta }}}}_{\mathrm{BME}}={{\mathbb{E}}}_{{\boldsymbol{\theta }}| \hat{{\boldsymbol{f}}},{\boldsymbol{X}}}[{\boldsymbol{\theta }}],\end{eqnarray} \tag{ 17 }$

where BME stands for Bayesian mean estimator. The mean estimator is not the only option, though it is optimal for certain figures of merit [39], or at least near-optimal [51]. Second, the posterior distribution naturally encodes 'error bars' by way of the posterior covariance tensor [13, 39]. Finally, the data can be processed online in the sense that new data can be incorporated into the distribution without the need to reanalyze all previous data at the same time. This lends itself naturally to adaptive tomography, discussed in the next section.

In practice, however, exactly implementing Bayesian mean estimation is quite difficult, as the expectation value in (17) may not be analytically tractable outside of important special cases. We will therefore follow the approach of Huszár and Houlsby [10] and use the particle filtering algorithm [52] to numerically implement Bayesian estimation. This approach has since been used by Ferrie [11, 12] and by Granade et al [13] to develop useful applications of Bayesian tomography, by Stenberg et al [53] to learn coherent states, and has been successfully applied outside of tomography to efficiently learn Hamiltonians using classical [54] and quantum resources [55]. For our purposes here, we are primarily interested in the property that once a datum has been incorporated into a particle filter, it may be discarded, such that we do not incur computational costs that grow faster than the amount of data. Utilizing this advantage, together with the results of Beskos et al [56] we conjecture that Bayesian tomography with particle filtering requires computational costs scaling as $O({d}^{4}{Np})$ , where $\rho \in {{\mathbb{C}}}^{d\times d}$ , N is the number of measurements, and p is the number of particles, as explained below. Naïvely, one might expect that $p\in O(\exp \,d)$ is required, but the result of Beskos et al [56] shows that p can always be chosen to be subexponential in d. Moreover, p can be chosen independently of d in cases where we use p to control the estimation accuracy rather than the problem dimension.

Particle filtering proceeds by approximating the prior and posterior distributions at each step of Bayesian inference as a weighted sum of δ functions,

$\begin{eqnarray}&&\Pr ({\boldsymbol{\theta }})\approx \displaystyle \sum _{i}{w}_{i}\delta ({\boldsymbol{\theta }}-{{\boldsymbol{\theta }}}_{i}),\end{eqnarray} \tag{ 18 }$

where $\{{w}_{i}\}$ are the weights of the particles located at $\{{{\boldsymbol{\theta }}}_{i}\}$ . Upon observing a datum ${\hat{f}}_{k}$ , the weights are then updated by calling the likelihood function for each particle,

$\begin{eqnarray}&&{w}_{i}\mapsto {w}_{i}\times \Pr ({\hat{f}}_{k}| {{\boldsymbol{\theta }}}_{i})/{ \mathcal N },\end{eqnarray} \tag{ 19 }$

where ${ \mathcal N }$ is the normalization factor in Bayes' rule (16), which can be found implicitly by demanding that ${\sum }_{i}{w}_{i}=1$ . The BME is then found by taking a sum over the particles representing the current posterior,

$\begin{eqnarray}&&{\hat{{\boldsymbol{\theta }}}}_{\mathrm{BME}}=\displaystyle \sum _{i}{w}_{i}{{\boldsymbol{\theta }}}_{i}.\end{eqnarray} \tag{ 20 }$

Numerical stability in particle filtering is provided by the use of a resampling algorithm which replaces the particles by a new set of particles that more effectively represents the same posterior. We will use the Liu and West resampling algorithm [57], which mixes the current posterior with a Gaussian distribution of the same mean and covariance. The resampling is controlled by a parameter $a\in [0,1]$ , with smaller a corresponding to 'more Gaussian' posteriors.

2. Adaptive and self-guided tomography

We have not yet addressed the issue of ${\boldsymbol{X}}$ , the matrix defining the choice of measurements. How should this choice of measurements be done? This is an open problem, with the lack of consensus mostly due to incompatible choices of criteria for optimality. In any case, the fact that some measurements are better than others suggests that improvements can be made through adaptive tomography—that is, choosing new measurement settings based on information obtained from past measurement settings.

2.1. Adaptive tomography

The first to consider adaptive state tomography was Fischer et al [17], who did so for a single qubit assumed to be in a pure state. That is, the prior was taken to be a uniform distribution on the surface of the Bloch sphere. The adaptivity consists of maximizing the entropy of the sampling distribution and expected fidelity. The estimator was chosen to be the maximum of the posterior distribution. This was later experimentally realized for a short set of measurements by pre-computing and storing the optimal experiment choices in a look-up table [22].

Adaptive state tomography has also been investigated in the context of parameterized models and Fisher information. Barndorff-Nielsen and Gill [18] showed that the quantum Fisher information for a single parameter can be obtained asymptotically by adaptively choosing the measurement settings in a two-stage procedure. The asymptotic two-step approach seems also to have been independently discovered by Řeháček et al [19] and Bagan et al [20]. An experimental demonstration has verified a quadratic improvement in accuracy [23, 58]. These approaches, however, are of more theoretical interest as they are guaranteed only asymptotically or require the total number of measurements to be specified a priori.

A generic approach using the maximum likelihood estimator and measurements minimizing the expected variance also showed an improvement over standard quantum tomography [21]. This has been made more practical through use of a recursive least-squares formula in Qi et al [25]. Below we will see that our choice of heuristic for adaptation may lead the least squares estimator to fail due to ill-conditionedness. Our results below will suggest that the better approach is the Bayesian one.

2.2. Bayesian adaptive quantum tomography

The Bayesian method also allows for a principled approach to adaptive measurements since one has a very formal definition of expected utility of a measurement. Consider (16) in the case of a hypothetical measurement ${\boldsymbol{X}}$ , which could produce data $\hat{{\boldsymbol{f}}}$ . Then, one can define the expected utility of the measurement as

$\begin{eqnarray}&&U({\boldsymbol{X}})={{\mathbb{E}}}_{\hat{{\boldsymbol{f}}},{\boldsymbol{\theta }}| {\boldsymbol{X}}}[L({\boldsymbol{\theta }})],\end{eqnarray} \tag{ 21 }$

where L is an arbitrary loss function.

Fischer et al [17] considered both the log-loss and fidelity for a single qubit. Huszár and Houlsby [10] considered the information gain, which has since been used to define an adaptive protocol in one- and two-qubit optical experiments [14, 15]. Most recently, the fidelity for arbitrary dimensions has been studied and numerics performed on one and two-qubits [59].

Calculating these utilities, however, poses a problem since one may be able to perform a great deal of non-optimized experiments before the calculation of the 'best' experiment can be completed. These intermediate experiments, while not optimized, still contain useful information about the state and may provide better accuracy when the cost of optimization is included. Hence the need for experiment design heuristics that realize the benefits of adaptivity without computing or optimizing over utility functions, providing significant improvements in efficiency.

In the context of Hamiltonian learning, for example, heuristics have been used to obtain many of the benefits of explicitly optimizing a utility, while avoiding much of the computational expense [55, 60]. Machine learning techniques have recently been applied to the design of good heuristics for quantum characterization problems [61], but we will take a different approach and instead use stochastic optimization to provide an efficient heuristic.

2.3. Self-guided quantum tomography

SGQT is an adaptive tomography scheme which avoids the linear inversion problem altogether by posing the tomography problem as one of optimization rather than estimation [30]. In particular, self-guided tomography finds a pure state $| \phi \rangle$ such that the overlap $F(\phi ,\rho )=\langle \phi | \rho | \phi \rangle$ is maximized for a true state ρ. If $\rho =| \psi \rangle \langle \psi |$ is a pure state, then $F(\phi ,\rho )$ is maximized if and only if $| \phi \rangle ={{\rm{e}}}^{{\rm{i}}\theta }| \psi \rangle$ for a phase θ, such that an optimal solution is also an accurate estimate of the true state.

An earlier work took a similar approach by testing whether the unknown qubit state was symmetric with a reference state [62], where the reference state is chosen adaptively to maximize fidelity. However, the method is defined only for a single qubit and requires a second fully characterized and controllable qubit along with an entangled measurement.

Having phrased state estimation as an optimization problem, self-guided tomography proceeds by experimentally estimating the objective function F from empirical frequencies. This results in a stochastically evaluated objective function, such that the optimization problem is amenable to attack by stochastic optimization algorithms. We will in particular rely on the simultaneous perturbative stochastic approximation (SPSA) [63].

The SGQT estimate is precisely defined as follows. We let $| \phi \rangle$ be a parameterization of pure states of a given dimension in terms of a vector ϕ of real numbers; for instance, qubit states can be parameterized by their Bloch angles. We then begin with a random state $| {\phi }_{0}\rangle$ and iteratively produce new states $| {\phi }_{k}\rangle$ which serve the dual role of specifying the current estimate of the state and next measurements to perform. At iteration k, we perform the measurements $\{{P}_{k,\pm },{\mathbb{1}}-{P}_{k,\pm }\}$ , where

$\begin{eqnarray}&&{P}_{k,\pm }=| {\phi }_{k-1}\pm {\epsilon }_{k}{{\rm{\Delta }}}_{k}\rangle \langle {\phi }_{k-1}\pm {\epsilon }_{k}{{\rm{\Delta }}}_{k}| ,\end{eqnarray} \tag{ 22 }$

and ${{\rm{\Delta }}}_{k}$ is a random vector that is constructed by setting each entry to ±1 with equal probability. Here ${\epsilon }_{k}$ is a step-size parameter chosen below. The outcomes of these measurements are denoted ${\hat{f}}_{k,\pm }$ . The gradient of the fidelity is estimated from these measurements to be

$\begin{eqnarray}&&{\hat{g}}_{k}=\displaystyle \frac{{\hat{f}}_{k,+}-{\hat{f}}_{k,-}}{2{\epsilon }_{k}}{{\rm{\Delta }}}_{k}.\end{eqnarray} \tag{ 23 }$

Using these, and an additional gain parameter ${\alpha }_{k}$ , the SPSA algorithm mimics standard gradient ascent, but along the random direction ${{\rm{\Delta }}}_{k}$ :

$\begin{eqnarray}&&| {\phi }_{k}\rangle =| {\phi }_{k-1}+{\alpha }_{k}{\hat{g}}_{k}\rangle .\end{eqnarray} \tag{ 24 }$

Convergence is guaranteed [63] given the specification of ${{\rm{\Delta }}}_{k}$ above and

$\begin{eqnarray}&&{\epsilon }_{k}=\displaystyle \frac{1}{{k}^{1/3}},\end{eqnarray} \tag{ 25a }$

$\begin{eqnarray}&&{\alpha }_{k}=\displaystyle \frac{1}{k}.\end{eqnarray} \tag{ 25b }$

Unless otherwise noted, however, we shall use the parameters suggested by Spall [63],

$\begin{eqnarray}&&{\epsilon }_{k}=0.1/{k}^{0.101}\ \mathrm{and}\ {\alpha }_{k}=10/{k}^{0.602}.\end{eqnarray} \tag{ 26 }$

SPSA has also been applied in quantum information to design high-fidelity control sequences given randomized benchmarking experiments [64, 65]. In particular, Ferrie showed that self-guided tomography can rapidly learn pure states for comparatively large quantum systems [30]. To the best of our knowledge, self-guided tomography is the only adaptive tomography technique which has gone beyond two qubits, even in simulation. SGQT has also recently been demonstrated in an optical experiment [24].

SGQT is not without its limitations, however. The aim of the current work is to mitigate the following three limitations of SGQT: (1) it is restricted to pure state tomography, (2) it does not report error bars, and (3) it can not be restricted to local measurements.

3. Practical adaptive quantum tomography

From the above discussion, we find that SGQT potentially offers many advantages for experimental practicality over traditional protocols, but at the cost that it does not accurately report mixed states, and does not certify its own errors. Happily, these are precisely the advantages of the Bayesian approach, such that we can collect data using self-guided tomography, then post-process with offline estimation.

We introduce PAQT—practical adaptive quantum tomography—an optimized numerical approach which implements the idea of merging self-guided tomography as an online experiment design heuristic into Bayesian data analysis. In principle, post-processing of self-guided tomography data could be carried out with any tomographic estimator. We define PAQT as the use of Bayesian estimation in particular on the data gathered through the course of a self-guided tomography experiment, owing to the rich statistical principles underlying Bayesian inference. In utilizing self-guided tomography, PAQT automatically selects experiments online and can be implemented with modest experimental hardware, including modern embedded controllers such as field-programmable gate arrays. The advantages of PAQT are that it provides the enhanced precision of adaptive tomography together with fast data processing and experiment design. The framework provides robust and easily interpretable error regions without additional overhead. Explicitly, PAQT uses the results of the measurements (22) specified by SGQT with the Bayesian mean estimator (17). As described in section 1.1, the frequencies ${\boldsymbol{f}}$ upon which the Bayesian estimator are conditioned are a description of the results of each measurement. Since (22) specifies which measurements are to be performed, this is a complete specification of our protocol. Although we demonstrate the algorithm for state tomography, the method is equally applicable to channel tomography and other estimation tasks and can easily accommodate other estimators.

Our results use the QInfer 1.0a1 [66], QuTiP [67] 3.2.0, NumPy [68] 1.9.2, Pandas [69] 0.16.2 and SciPy 0.15.1 [70] libraries for Python 2.7 (Enthought Canopy 1.5.4) to perform the Bayesian analysis. We performed all simulations on the University of Sydney School of Physics cluster. Full source code for our simulations, and for our implementations of self-guided and least-squares tomography, as well as data summarizing all 554, 250 trials used in our numerical results can be found online in the supplementary material [1]. These trials are split over 652 different experimental conditions, such that for most plots, each point is generated from approximately 850 trials. In all numerical experiments, true states are chosen at random for each trial from the Ginibre distribution, which includes the Hilbert–Schmidt uniform and Haar uniform distributions as special cases [71]. For brevity, we will indicate these special cases as 'mixed' and 'pure', respectively. The supplemental material also includes complete details for all figures in this paper, and can be used to reproduce all numerical results shown here.

We start by noting in figure 1 that, in the case of qubits, the states estimated by self-guided tomography are almost as indistinguishable as the pure state closest to each true state in terms of the 1-norm. This makes it clear that, although self-guided tomography should not be expected to return a useful estimate if the true state is mixed, it is still heavily dependent on the true state such that we should expect self-guided tomography to collect useful data.

$\rho =| \psi \rangle \langle \psi | $ — **Figure 1.** Distinguishability between self-guided estimated states and true states drawn from the Hilbert–Schmidt prior for a qubit, plotted versus the best achievable distinguishability for any estimator constrained to pure states $\rho =| \psi \rangle \langle \psi |$ , where the distinguishability between ρ and σ is defined as the trace distance $\tfrac{1}{2}\parallel \rho -\sigma {\parallel }_{1}$ , minimized for pure ρ and fixed σ by $(1-\sqrt{2\mathrm{Tr}({\sigma }^{2})-1})/2$ . The self-guided estimates are drawn from 10,000 iterations with either n = 5, 50 or 500 shots per measurement, such that each measurement f_k is drawn from a binomial distribution with n trials. As the number of shots per measurement increases, the self-guided estimates approach the closest states allowed by the pure state assumption, demonstrating that the self-guided procedure produces useful data even when the true state is mixed.
Download figure:
Standard image High-resolution image

Indeed, as we show in figure 2, PAQT effectively combines self-guided tomography with least-squares and Bayesian estimators for both pure and mixed states on a qubit. In particular, even though self-guided tomography has ceased to learn states when the true state is a mixed state, the data collected can be used by both the Bayesian and LSF tomographic estimators to return very good estimates of the state.

**Figure 2.** Median infidelity $r=1-F$ for self-guided tomography on single qubit (top) pure and (bottom) mixed states, both without post-processing the self-guided data (green), as well as post-processing via PAQT using Bayesian (orange) and least-squares estimators (gray and blue). In both cases, Bayesian tomography is performed with a full-rank (Hilbert–Schmidt) prior, using the particle filter summarized in section 1.4 with 4000 particles and the resampling parameter a = 0.98. The shaded regions indicate the 16% and 84% quantiles over trials. Note that, for a normal distribution, this region would coincide with the $1\sigma$ –confidence interval, but as illustrated in figure 3, the losses are far from normally distributed, such that we cannot make the normal interpretation. The self-guided procedure works very well for pure states (top), providing estimates with fidelity approximately $99.999 \%$ after 10⁷ bits of data, while the Bayes estimator uses a full-rank prior and thus underperforms on pure states due to this hedging. By contrast, for mixed states, the self-guided procedure does not learn well on its own, but post-processing the self-guided data with Bayesian or least-squares estimation produces high-fidelity estimates.
Download figure:
Standard image High-resolution image

Which estimator in particular gives the lowest error depends strongly, however, on the loss function that one uses to quantify error. In figure 3, we compare the distribution over losses for the four tomographic procedures as applied to qubit pure and mixed states, and as measured by the infidelity and quadratic loss functions. Whereas self-guided tomography directly optimizes the infidelity, we note that it performs very well according to this measure in the pure-state case. Similarly, the Bayesian mean estimator is optimal for Bregman divergences such as the quadratic loss $L{({\boldsymbol{\theta }},\hat{{\boldsymbol{\theta }}}):=({\boldsymbol{\theta }}-\hat{{\boldsymbol{\theta }}})}^{{\rm{T}}}({\boldsymbol{\theta }}-\hat{{\boldsymbol{\theta }}})$ , so that it performs very well if we choose to quantify errors accordingly.

In figure 4, we consider self-guided tomography of pure and mixed qutrit states, showing that the benefits of using PAQT to combine SGQT with Bayesian tomography persist in this case. Notably, least-squares fitting does significantly less well for self-guided datasets on pure qutrits. Reducing the resampling parameter a to 0.9 allows the Bayesian estimator to remain robust in this case, however.

We also consider the case in which the optimization procedure used by self-guided tomography is restricted to an incorrect model of the system under study. In particular, in figure 5, we collect data under the restriction that the true state is a mixed or pure product state of two qubits, then draw the true state from a Haar or Hilbert–Schmidt prior on the full four-dimensional state. In this way, the self-guided algorithm is explicitly following an incorrect model for the state. We note that, despite this, the Bayesian and least-squares estimators are both able to improve on their initial uncertainty by using data collected from the product state measurements. It is also interesting to note that the protocol with fewer measurements seems to perform better, which might be counterintuitive. However, remember that for this scenario, the model is wrong. More data will produce an estimate that is more accurate, but with respect to the wrong model. Thus, the procedure with fewer measurements performs better in this scenario through less accurate (noisy) measurements.

**Figure 5.** Median infidelity for self-guided tomography on pure (top) and mixed (bottom) states of two qubits, restricted to product measurements. In both cases, we use PAQT for post-processing with 32 000 particles for the Bayesian estimator, and the self-guided tomography data is collected with a gain of ${\alpha }_{k}=31/{k}^{0.602}$ and a step of ${\epsilon }_{k}=0.1/{k}^{0.101}$ .
Download figure:
Standard image High-resolution image

**Figure 5.** Median infidelity for self-guided tomography on pure (top) and mixed (bottom) states of two qubits, restricted to product measurements. In both cases, we use PAQT for post-processing with 32 000 particles for the Bayesian estimator, and the self-guided tomography data is collected with a gain of ${\alpha }_{k}=31/{k}^{0.602}$ and a step of ${\epsilon }_{k}=0.1/{k}^{0.101}$ .
Download figure:
Standard image High-resolution image

Finally, we note that the performance of the Bayesian estimator can be dramatically improved if we postselect on diagnostic information provided by the particle filtering algorithm. In figure 6, we show the kernel-density estimated distribution over infidelity for each of the qutrit and two-qubit cases, postselecting on the smallest effective sample size observed during a tomography run. That is, we accept a tomography trial if the particle filter weights $\{{w}_{i}\}$ satisfy

$\begin{eqnarray}&&\displaystyle \frac{1}{\displaystyle \sum _{i}{w}_{i}^{2}}\geqslant {n}_{\mathrm{th}}\end{eqnarray} \tag{ 27 }$

throughout the experiment, for some choice of threshold ${n}_{\mathrm{th}}$ . For the qutrit case, using either 32 000 or 128 000 particles, we observe that as we increase this threshold (that is, as we demand a larger effective sample size), the mean performance rapidly approaches the median performance. Thus, performing this postselection allows us to exclude the worst-case performance of the Bayesian estimator. On the other hand, when the data are not especially informative, as in the two-qubit product measurement case, the benefit of postselection is significantly less pronounced.

**Figure 6.** Performance of PAQT Bayesian post-processing when postselecting on trials during which the effective sample sizes ${n}_{\mathrm{ess}}$ remains above various thresholds during out the estimation procedure, for qutrit data and for two-qubit data restricted to product measurements. For each of the three data sets, the left-hand subfigure shows the kernel density estimate over infidelity, demonstrating that more demanding thresholds can 'shift' the distribution over infidelity, especially for the product-measurement case. The upper-right subfigures for each data set show the approach of the mean infidelity to the median fidelity as a function of the post-selection threshold, while the lower-right subfigures show the probability of the postselection succeeding. Importantly, in three of the four cases, we observe that post-selection on the diagnostics produced by Bayesian particle filtering can help eliminate trials with less accurate estimates. For the case in which both a large number of particles are used and a large amount of data is taken, the effect of post-selecting on diagnostics is much less pronounced. The data in this figure was generated using 10 000 iterations, with the number of shots per iteration and SMC particle count indicated in the subfigure titles.
Download figure:
Standard image High-resolution image

**Figure 6.** Performance of PAQT Bayesian post-processing when postselecting on trials during which the effective sample sizes ${n}_{\mathrm{ess}}$ remains above various thresholds during out the estimation procedure, for qutrit data and for two-qubit data restricted to product measurements. For each of the three data sets, the left-hand subfigure shows the kernel density estimate over infidelity, demonstrating that more demanding thresholds can 'shift' the distribution over infidelity, especially for the product-measurement case. The upper-right subfigures for each data set show the approach of the mean infidelity to the median fidelity as a function of the post-selection threshold, while the lower-right subfigures show the probability of the postselection succeeding. Importantly, in three of the four cases, we observe that post-selection on the diagnostics produced by Bayesian particle filtering can help eliminate trials with less accurate estimates. For the case in which both a large number of particles are used and a large amount of data is taken, the effect of post-selecting on diagnostics is much less pronounced. The data in this figure was generated using 10 000 iterations, with the number of shots per iteration and SMC particle count indicated in the subfigure titles.
Download figure:
Standard image High-resolution image

4. Discussion

Though the point of SGQT is to avoid solving a large system of linear equations, the data collected from the performed measurements still define a set of equations that can be inverted in one way or another. This is the approach of LSF and weighted LSF. However, we note that these approaches do not perform well in all but a few of the cases considered. The explanation for this observation is that the constructed linear system is in general ill-conditioned.

Given infinite precision data, SGQT measurements would trace out a straight path through state space from the initial guess to the true state, following the gradient of the fidelity. This set of measurements will not be informationally complete. Due to the stochasticity of the algorithm, for finite data, a sufficiently large number of SGQT iterations will be informationally complete, but most of the measurements will be linearly dependent. This frustrates the stability of attempting to solve the linear equations defined by (6). The standard approach to quantify the stability of a linear system is through the condition number

$\begin{eqnarray}&&\kappa ({\boldsymbol{X}})=\displaystyle \frac{{\sigma }_{1}({\boldsymbol{X}})}{{\sigma }_{{d}^{2}}({\boldsymbol{X}})},\end{eqnarray} \tag{ 28 }$

where ${\sigma }_{1}({\boldsymbol{X}})$ is the largest and ${\sigma }_{{d}^{2}}({\boldsymbol{X}})$ is the smallest singular value. Smaller condition numbers lead to more stable linear systems. We will argue and demonstrate that self-guided tomography leads to measurements which define a linear system with large condition number. Importantly, it is only the process by which data is gathered (rather than analyzed) that determines the condition number. We will therefore restrict our discussion of condition numbers to SGQT as a data gathering procedure and the resultant effect on the numerical stability of different estimation strategies. In particular, our use of the condition number is distinct from its use in assessing the utility of a set of measurements for tomographic estimation [72, 73], as we are concerned not with tomographic completeness but with numerical stability.

The largest singular value will be related to the total number of SGQT iterations since most of the late measurements will be nearly co-linear, clustering around the true state. The smallest singular value would be 1 in the ideal case of performing a subset of orthogonal basis measurements. However, as noted above, the system is only barely informationally complete—the matrix ${\boldsymbol{X}}$ is nearly rank-deficient (rank $\lt {d}^{2}-1$ ), in other words. The actually value of ${\sigma }_{{d}^{2}}({\boldsymbol{X}})$ , and hence, $\kappa ({\boldsymbol{X}})$ , will vary quite a bit from run to run, but the scaling with the total number of measurements, K, will be $O(\sqrt{K})$ . This is because most of the measurements will be approximately co-linear. In the exact case where ${\boldsymbol{X}}$ consists of $({d}^{2}-1)\times ({d}^{2}-1)$ orthogonal submatrix and $K-1$ repeated rows, the condition number is identically $\sqrt{K}$ .

In figure 7, we plot the empirical condition number of ${\boldsymbol{X}}$ as a function of the total number of SGQT iterations. We see the expected behavior. The condition number starts high as there are simply not enough measurements to ensure informational completeness. Then, the condition number reaches a minimum value before rising at a rate of approximately $\sqrt{K}$ due to many nearly (but not exactly) identical measurements² .

This effect identifies a fundamental tension between the benefit of measurement adaptivity and offline data analysis, which is why PAQT does well in spite of this tension. We note that in most cases using PAQT with a Bayesian mean estimator performs quite well and comes with many added benefits, as discussed above. In the cases where the Bayesian mean estimator does not perform well, we conjecture this is due to non-optimal choices of the particle filtering algorithm parameters rather than a fundamental problem of ill-conditionedness. This is not a problem to be swept under the rug, however, and a non-trivial optimization will need to be performed to find good operating points for the particle filtering algorithm.

A second comment concerns the standard claim in quantum state tomography work that all results obtained for states will immediately apply to quantum process tomography due to the isomorphism between quantum states and channels. Though this claim is broadly true, there is an important subtlety that we must consider. Under the Choi–Jamiłokowski isomorphism [74, 75], process tomography is equivalent to state tomography with a restriction on allowable priors and measurements. Thus, the product-state model of figure 5 is especially important in that it immediately shows that our adaptive state tomography protocol also provides a protocol for process tomography. Indeed, the Choi–Jamiłokowski isomorphism gives that product measurements on two copies of a quantum system are equivalent to preparing a state, evolving under an unknown map, and then measuring the output state [13, 76]. This observation has recently been utilized by Pogorelov et al [16] to perform adaptive quantum process tomography with Bayesian estimation implemented by particle filtering. With this in mind, then, our results show that self-guided state tomography is an efficient heuristic for designing quantum process tomography experiments, and may pose an interesting tradeoff for the computational cost of explicit adaptivity, in the same sense as self-guided tomography without the product measurement constraint provides a useful tradeoff for adaptive state tomography. The generalization to process tomography will be explored further in future work.

5. Conclusion

In summary, we have shown how to mitigate the drawbacks of SGQT using PAQT to provide explicit and statistically principled adaptive quantum tomographic estimates. In numerically testing PAQT, we have shown that SGQT alone is extremely efficient when the true state is pure, such that it is computationally challenging to compete with SGQT in high-dimensional problems where the pure state assumption is explicitly met. This allows us to more carefully deliniate between the advantages of each protocol, and to provide practical solutions for adaptive tomography.

However, more work needs to be done to refine the self-guided heuristic for mixed states and restricted measurement scenarios. An interesting open problem suggested by our work is to investigate if the scaling advantages of SGQT remain when using mixed states and, in the case of two or more qubits, when using product measurements. We expect that designing good heuristics for the challenging estimation problems which lie ahead for quantum technology will become an active area of research, as it has for classical machine learning problems.

Acknowledgments

This work was supported by the US Army Research Office grant numbers W911NF-14-1-0098 and W911NF-14-1-0103, and by the Australian Research Council Centre of Excellence for Engineered Quantum Systems. STF acknowledges support from an Australian Research Council Future Fellowship FT130101744. We thank Sarah Kaiser for helpful comments. We thank Okabe and Ito [77] for their suggestion of a colorblind-safe palette for figures and plots. We acknowledge Thai La Ong restaurant for forgetting about CF's tofu laksa, which forced us to spend an extra 30 minutes at the restaurant and led to a discussion between the authors on the feasibility of the result. CG thanks Jacob Bridgeman for assistance in using cluster resources.

Practical adaptive quantum tomography^{^*}

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. The tomographic problem

1.1. Problem set up

1.2. Linear inversion tomography

1.3. Maximum likelihood tomography

1.4. Bayesian tomography

2. Adaptive and self-guided tomography

2.1. Adaptive tomography

2.2. Bayesian adaptive quantum tomography

2.3. Self-guided quantum tomography

3. Practical adaptive quantum tomography

4. Discussion

5. Conclusion

Acknowledgments

Footnotes

Practical adaptive quantum tomography*

Article metrics

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. The tomographic problem

1.1. Problem set up

1.2. Linear inversion tomography

1.3. Maximum likelihood tomography

1.4. Bayesian tomography

2. Adaptive and self-guided tomography

2.1. Adaptive tomography

2.2. Bayesian adaptive quantum tomography

2.3. Self-guided quantum tomography

3. Practical adaptive quantum tomography

4. Discussion

5. Conclusion

Acknowledgments

Footnotes

Practical adaptive quantum tomography^{^*}