Belief propagation decoding of quantum channels by passing quantum messages

Joseph M Renes

doi:10.1088/1367-2630/aa7c78

1. Introduction

Graphical models are at the heart of the current revolution in machine learning and computational statistics. They provide simple representations of the correlations among large numbers of random variables and enable efficient algorithms for feature discovery and analysis. Among the most well-known of these algorithms is belief propagation (BP), whose origin can be traced to the Bethe–Peierls approximation in statistical physics [1]. BP can be used to marginalize the joint distribution of several random variables, often efficiently. For instance, in the setting of reliable communication over noisy channels via error correction, BP is used to find the most likely input for a given set of observed outputs. Indeed, in modern coding theory BP is simply indispensible [2]. The joint distribution of channel inputs and outputs can be represented by a factor graph, and BP works by passing messages between the nodes of this graph (an instance of more general message-passing algorithms). This leads to efficient decoding algorithms for high rate codes, several of which are employed in current wireless communication standards. Moreover, it was recently shown that BP decoding of a certain class of low-density parity-check (LDPC) codes can achieve the Shannon capacity [3].

Factor graphs have been adapted to the quantum-mechanical setting from several different perspectives [4–7]. Applied to quantum communication, BP and other message passing methods have been constructed for syndrome decoding of a variety of stabilizer codes subjected to Pauli noise channels [5, 8–14]. Despite their use in decoding quantum codes, these message passing algorithms are classical. Indeed, decoding any stabilizer code used for a Pauli channel or the erasure channel is essentially a classical task due to the Gottesman–Knill theorem [15]. However, stabilizer decoding is not optimal for non-Pauli channels such as the amplitude damping channel, for either the entanglement fidelity achievable by fixed-size codes or the largest achievable rates for codes with increasing blocklength. Therefore it would be of interest to extend BP decoding to more general channels. As much also holds in the setting of quantum polar codes, where the classical decoding method (ultimately a variant of BP) can only be employed without loss of rate for Pauli channels or the erasure channel [16–18].

Note that the quantum decoding problem is different than the one solved by the classical algorithm for 'quantum BP' in [5]¹ . There, one is interested in computing marginals of quantum states which have a structure given by a factor graph. For classical decoding, computing such marginals is indeed sufficient, as we will describe in more detail below. But even for bitwise decoding of a classical–quantum (CQ) channel having classical input and quantum output, it is not enough to know the relevant marginal state; we need a way to perform the optimal (Helstrom) measurement [20] or some suitable approximation. Put differently, a quantum BP decoder is a quantum algorithm, and we may expect that it will need to pass quantum messages.

In this paper we construct a quantum BP decoding algorithm for the pure state channel, a binary input CQ channel whose outputs are pure states. The algorithm for estimating a single input bit works by passing single qubits as well as classical information along the factor graph, while sequential estimation of all input bits requires passing many qubits. For codes whose factor graphs are trees, as well as for polar codes, we show how the BP decoder leads to explicit circuits for the optimal measurement that have quadratic size in the code length. To the best our knowledge, this is the first instance of a quantum algorithm for BP.

The pure state channel arises, for instance, in binary phase-shift keying (BPSK) modulation of a pure loss Bosonic quantum channel, whose channel outputs are coherent states [21]. Thus, our result gives an explicit construction of a successive cancellation decoder for the capacity-achieving polar code described in [21], and addresses the issue of decoding CQ polar codes discussed in [17]. Moreover, the pure state channel also arises as part of the quantum polar decoder for the amplitude damping channel [16, 18], and therefore our result gives an explicit decoder for polar codes over this channel.

The remainder of the paper is structured as follows. In the next section give a very brief overview of factor graphs and their use in classical decoding, and then rewrite the BP rules in a manner that lead to the quantum algorithm. Section 3 gives the quantum BP decoding algorithm and applications to polar codes are given in section 4.1. We finish with several open questions for future research raised by our result.

2. BP decoding on factor graphs

Let us first examine BP on factor graphs directly in the coding context; for a more general treatment see [2, 22]. Consider the problem of reliable communication over a memoryless channel W using a linear code C. Fix C to be an n-bit code, i.e. a linear subspace of ${{\mathbb{Z}}}_{2}^{n}$ , and suppose that the channel W maps inputs in ${ \mathcal X }={{\mathbb{Z}}}_{2}$ to some alphabet ${ \mathcal Y }$ according to the transition probabilities ${P}_{Y| X=x}=W(y| x)$ . Now suppose a codeword ${x}_{1}^{n}=({x}_{1},{x}_{2},\,\ldots ,\,{x}_{n})\in C$ is picked at random and its consituent bits are each subjected to W, producing the output y₁ⁿ. The goal of decoding is to invert this process and determine the input codeword from the channel output. This is a task of statistical inference, whose nominal solution is to output the xⁿ₁ which maximizes the conditional probability of inputs given outputs, ${P}_{{X}^{n}| {Y}^{n}}$ . Since we assume the inputs are uniformly chosen from C, we can directly work with the joint distribution ${P}_{{X}^{n}{Y}^{n}}$ of inputs and outputs. In general, though, this task is known to be computationally intractable.

A simpler approach is to decode bitwise and find the most likely value of x_k given y₁ⁿ, for each k. Then we are interested in the marginal distribution ${P}_{{X}_{k}{Y}^{n}}$ , and we need only determine which of the two values of x_k maximize ${P}_{{X}_{k}{Y}^{n}}({x}_{k},{y}_{1}^{n})$ . Exact marginalization is also generally computationally intractable since the size of the joint distribution grows exponentially in the number of variables. However, for linear codes the joint distribution can be factorized, which often greatly simplifies the marginalization task. The joint distribution ${P}_{{X}^{n}{Y}^{n}}$ can be written

$\begin{eqnarray}&&{P}_{{X}^{n}{Y}^{n}}({x}_{1}^{n},{y}_{1}^{n})=\displaystyle \frac{1}{| C| }{\mathbb{1}}[{x}_{1}^{n}\in C]\displaystyle \prod _{j=1}^{n}W({y}_{j}| {x}_{j}).\end{eqnarray} \tag{ 1 }$

Since the channel is memoryless, the channel contribution to (1) is already in factorized form. Meanwhile, code membership is enforced by a sequence of parity-check constraints associated with the code, which also leads to factorization. In the three-bit repetition code, for instance, there are two parity constraints, ${x}_{1}+{x}_{2}=0$ and ${x}_{2}+{x}_{3}=0$ (or ${x}_{1}+{x}_{3}=0$ ), and therefore ${\mathbb{1}}[{x}_{1}^{3}\in C]={\mathbb{1}}[{x}_{1}+{x}_{2}=0]\,{\mathbb{1}}[{x}_{2}+{x}_{3}=0]$ . We can represent the joint distribution of any linear code (up to normalization) by a factor graph; figure 1 shows the factor graph of a code involving two parity checks on four bits. For an arbitrary factorizeable function, the factor graph contains one (round) variable node for each variable and one (square) factor node for each factor, and factor nodes are connected to all their constituent variable nodes. This convention is violated in the figure by not including y_j variable nodes; instead they are treated as part of the channel factors since their values are fixed and in any case each is connected to only one factor node.

**Figure 1.** Factor graph for the joint probability distribution of a four-bit code with two parity checks ${x}_{1}+{x}_{3}=0$ and ${x}_{1}+{x}_{2}+{x}_{4}=0$ .
Download figure:
Standard image High-resolution image

For factor graphs which are trees, meaning only one path connects any two nodes as in figure 1, the BP algorithm can compute the marginal distributions exactly. In the present context of coding, it directly finds the most likely input value. Supposing we are interested in determining x₁, treat variable node x₁ as the root of the tree. BP then proceeds by passing messages between nodes, starting from the leaves (here, channel outputs) and working inward, combining all relevant information as it goes. Simplifying the general BP rules (see [2]) to the decoding problem, the initial messages from the channel factors to the variable nodes can be taken as the log-likelihood ratios ${\ell }=\mathrm{log}[W({y}_{j}| 0)/W({y}_{j}| 1)]$ of the channel given the output y_j (here we suppress the dependence of ℓ on the channel output y_j). At variable nodes the messages simply add, so that the outgoing ℓ is the sum of incoming ℓ_k. At check nodes the rule is more complicated: $\tanh \tfrac{{\ell }}{2}={\prod }_{k}\tanh \tfrac{{{\ell }}_{k}}{2}$ . After all messages have arrived at the root, the algorithm produces the log-likelihood ratio for x₁ given all the channel outputs, and the decoder simply outputs 0 if the ratio is positive or 1 if negative.

By adopting a modified update rule it is in fact possible to compute all the marginals at once with only a modest overhead. Instead of only proceeding inward from the leaves, we send messages in both directions along each edge, starting by sending channel log-likelihoods in from the leaves. Each node sends messages on each edge once it has received messages on all its other edges. For graphs that contain loops, the algorithm is not guaranteed to converge, but one can nevertheless hope that the result is a good approximation and that the decoder outputs the correct value. This is borne out in practice for turbo codes and LDPC codes.

There is an intuitive way of understanding the BP decoding algorithm which is the basis of our quantum generalization. At every step the message can be interpreted as the log-likelihood ratio of the effective channel from that node to its descendants. This is sensible as the likelihood ratio is a sufficient statistic for estimating the (binary) input from the channel output. The rules for combining messages can then be interpreted as rules for combining channels, and the algorithm can be seen as successively simplifying the channel from the root to the leaves by utilizing the structure of the factor graph. At variable nodes, adding the log-likelihood ratios for two channels W and $W^{\prime}$ amounts to considering the convolution channel with transition probabilities given by

That is, the effective channel associated with a variable node is simply the convolution ${W}_{k}$ of its descendants. The form of the effective channel at check nodes is not as immediate, but it is not too difficult to verify that the appropriate channel convolution has transition probabilities

These two channel convolutions are also the fundamental building blocks of polar codes [23], at least when the input channels are symmetric. The check node convolution is the 'worse' channel in the channel splitting or channel synthesis step (see [23], equation (19)); this holds regardless of the symmetry of the channel. On the other hand, the 'better' combination of W and $W^{\prime}$ is defined by (see [23], equation (20)) $W^{\prime\prime} (y,y^{\prime} ,x| x^{\prime} )=\tfrac{1}{2}W(y| x+x^{\prime} )W^{\prime} (y^{\prime} | x^{\prime} )$ . Compared to (2), the input x is uniformly random and not always zero, but it is given at the channel output. When W is symmetric in the sense that $W(y| x+u)=W({\pi }_{u}(y)| x)$ for a suitable permutation π of the output alphabet depending on u, we can reversibly transform $W^{\prime\prime}$ into and vice versa.

3. BP decoding of quantum outputs

The form of the check and variable convolutions also applies to channels with quantum output² . We need only replace the probability distributions over the output alphabet by quantum states. Abusing notation, let us denote by W(x) the quantum state of the output of W given input x. This includes the previous case by considering commuting W(x). The the variable and check node convolutions are now just

To properly generalize the BP decoding algorithm we need a 'sufficient statistic' for the quantum channels at the various nodes. For binary-input pure state channels, it turns out that a combination of classical bits and just one qubit suffices. The channel outputs can always be represented by a qubit, so suppose that W outputs $| \pm \theta \rangle$ , where $| \theta \rangle =\cos \tfrac{\theta }{2}| 0\rangle +\sin \tfrac{\theta }{2}| 1\rangle$ . Note that the overlap of the two states is $\cos \theta$ and the Helstrom measurement for these two states is measurement of the ${\sigma }_{x}$ operator.

The convolution outputs either $| \theta \rangle \otimes | \theta ^{\prime} \rangle$ or $| -\theta \rangle \otimes | -\theta ^{\prime} \rangle$ , which are again two pure states, with an overlap angle given by . The following unitary transformation compresses the states to the first qubit, leaving the second in the state $| 0\rangle$ :

with ${a}_{\pm }\sqrt{1+\cos \theta \cos \theta ^{\prime} }=\tfrac{1}{\sqrt{2}}\left(\cos \left(\tfrac{\theta -\theta ^{\prime} }{2}\right)\pm \cos \left(\tfrac{\theta +\theta ^{\prime} }{2}\right)\right)$ and ${b}_{\pm }\sqrt{1-\cos \theta \cos \theta ^{\prime} }=\tfrac{1}{\sqrt{2}}\left(\sin \left(\tfrac{\theta +\theta ^{\prime} }{2}\right)\mp \sin \left(\tfrac{\theta -\theta ^{\prime} }{2}\right)\right)$ . To combine more than two channels, we just perform the pairwise convolution sequentially. Thus, the convolution of pure state channels can itself be represented as a pure state channel.

The convolution is more complicated because the outputs are no longer pure. However, applying the unitary results in a CQ state of the form We are free to measure the second qubit, and conditional state of the first qubit is again one of two pure states, though now the overlap depends on the measurement outcome j. In particular, ${p}_{0}=\tfrac{1}{2}(1+\cos \theta \cos \theta ^{\prime} )$ , ${p}_{1}=1-{p}_{0}$ , and the two overlaps are given by

For outcome j = 0 the angle between the states has decreased, while for outcome j = 1 the angle has increased. Therefore, the convolution of pure state channels can be represented by two pure state channels, corresponding to the two measurement outcomes. As before, several channels can be combined sequentially.

The quantum decoding algorithm now proceeds as in classical BP, taking the quantum outputs of the channels and combining them at variable and check nodes. At a variable node the algorithm combines the outputs using and forwards the output to its parent node. At check nodes the algorithm applies , measures the second qubit, and forwards both the qubit and the measurement result to its parent node. The classical messages are required to inform parent variable nodes how to choose the angles in subsequent unitaries. Ultimately this procedure results in one qubit at the root node such that measurement of ${\sigma }_{x}$ corresponds to the optimal Helstrom measurement for the associated bit. This then is sufficient to estimate one input bit.

For example, return to the code depicted in figure 1 for a pure state channel with overlap θ, and suppose we are interested in decoding the first bit. Starting at the leaves, the outputs of all but the first channel can be immediately passed to their corresponding variable nodes, since these variable nodes do not have any other outward branches. (Formally this follows from the convolution rules by considering convolution with a trivial channel, having $\theta =0$ .) The output of the first channel, meanwhile, must wait to be combined according to the convolution with several other qubit messages. Next, since 2 and 4 are connected by a check node, we combine qubits 2 and 4 into one qubit (2) and one classical bit (4) by applying and measuring the 4th qubit. As qubits 1 and 3 are connected by a variable node, we can simultaneously combine these with . Finally, we combine qubits 1 and 2 by applying , where and , depending on the value j of the earlier measurement. A quantum circuit implementing these steps is shown in figure 2.

**Figure 2.** Circuit decoding the first bit of the code depicted in figure 1. The first convolution is the second for and , depending on the value j of the measurement outcome in the bottom wire. The symbol ⊣ denotes that the qubit is discarded. The final Hadamard gate and measurement implement the Helstrom measurement.
Download figure:
Standard image High-resolution image

**Figure 2.** Circuit decoding the first bit of the code depicted in figure 1. The first convolution is the second for and , depending on the value j of the measurement outcome in the bottom wire. The symbol ⊣ denotes that the qubit is discarded. The final Hadamard gate and measurement implement the Helstrom measurement.
Download figure:
Standard image High-resolution image

One drawback is that the above procedure implements the Helstrom measurement destructively, since once we estimate the first bit we no longer have the original channel output in order to estimate the second bit. And we cannot run the algorithm backwards to reproduce the channel output as we have made measurements at every check node. To implement the Helstrom measurement as non-destructively as possible, we can leave the CQ output states unmeasured and instead use the classical subsystems to coherently control the variable node unitaries . In this way the steps in the algorithm can be reversed, save the final measurement. For example, in figure 2 all output qubits are kept and the classical measurement and subsequent conditioning of the second gate is performed by a coherent conditional gate involving three qubits.

Denoting the unitary action of the algorithm for the jth bit by V_j, the Helstrom measurement can be implemented by the projective measurement with projectors ${{\rm{\Pi }}}_{j,k}={V}_{j}^{* }| \tilde{k}\rangle \langle \tilde{k}{| }_{j}{V}_{j}$ , where $| \tilde{k}\rangle \langle \tilde{k}{| }_{j}$ denotes the kth ${\sigma }_{x}$ basis projector on the jth qubit. Each V_j is composed of O(n) gates, yielding an overall circuit size of $O({n}^{2})$ to decode all bits. Supposing that the code is designed such that the jth input bit can be estimated with error no larger than ${\epsilon }_{j}$ , Gao's non-commutative union bound [25] implies that the error in sequentially estimating all bits is no worse than $4{\sum }_{j}{\epsilon }_{j}$ .

4. Applications to polar codes

4.1. Polar codes for the pure state channel

Polar codes for the pure state channel may also be decoded with this algorithm. Indeed, the successive cancellation decoding algorithm proposed by Arıkan in [23] proceeds precisely by combining channels using the and rules, and was adapted to the case of CQ channels in [24]. The difference is that successive cancellation does not use the factor graph of the code, but a graph related to a fixed reversible encoding circuit. Importantly, the graph associated to each input of the encoding circuit is a tree. In fact, each such graph has logarithmic depth from all channel factors to each variable, and every node has degree three. Unlike the BP decoder, however, the successive cancellation decoder used by polar codes takes previously decoded bits into account. But these bits can be handled by the BP decoder since the pure state channel is symmetric in the manner described at the end of section 2. There, the value of the previous bits is incorporated into the better channel by appropriately permuting the output symbols, which is equivalent to flipping the input value. Similarly, for the pure state channel, applying ${\sigma }_{z}$ to the output is equivalent to flipping the input. Therefore, the quantum BP decoding algorithm gives a successive cancellation decoder for polar codes over the pure loss Bosonic channel using the BPSK constellation [21].

4.2. Quantum polar codes for amplitude damping

The idea behind the quantum polar coding scheme of [16, 18] is to decompose the problem of transmitting quantum information over a channel ${{ \mathcal N }}_{A\to B}$ into transmitting classical information about two conjugate observables, 'amplitude' and 'phase', consider polar codes for each subproblem, and then combine the coding schemes using CSS codes at the encoder and coherent sequential decoding of amplitude and phase at the decoder. This decoding strategy is depicted in [16], figure 3 for Pauli channels and [26, figure 1] for the general case. As detailed in [18], the two classical transmission tasks are to transmit 'amplitude' information over the CQ channel given by $z\to {\rho }_{z}={ \mathcal N }(| z\rangle \langle z| )$ and 'phase' information over the CQ channel given by $x\to {\varphi }_{x}=({Z}^{x}\otimes {\mathbb{1}})({ \mathcal I }\otimes { \mathcal N })[{\rm{\Phi }}]({Z}^{x}\otimes {\mathbb{1}})$ . Here $| z\rangle$ is an arbitrary basis, and we choose that of ${\sigma }_{z}$ for convenience, while ${| {\rm{\Phi }}\rangle }_{A^{\prime} A}={\sum }_{z}\sqrt{{p}_{z}}| z\rangle | z\rangle$ is a bipartite pure state in this same basis with coefficients of our choosing. (See [18] for the precise relation to the conjugate observables ${\sigma }_{x}$ and ${\sigma }_{z}$ .)

Let us now show how to build a decoder for the amplitude damping channel ${{ \mathcal N }}_{\gamma }$ with damping parameter $\gamma \in [0,1]$ . First note that the amplitude outputs all commute due to the form of ${{ \mathcal N }}_{\gamma };$ the amplitude channel is effectively a classical Z channel in which the input 0 is always transmitted perfectly, but the input 1 may decay to 0 with probability γ. Therefore we can use the classical polar encoder and decoder for this channel [27]. Since the Z channel is not symmetric, the optimal input distribution in the capacity formula is not the uniform distribution, but one with probabilities p and $1-p$ .

Now suppose that the bipartite pure state in the phase channel is the state $| {\rm{\Phi }}\rangle =\sqrt{p}| 00\rangle +\sqrt{1-p}| 11\rangle$ . Abusing notation slightly and denoting the channel outputs ${\varphi }_{\pm }$ , it is not difficult to verify that for $U={{\rm{cnot}}}_{A^{\prime} \to B}$ ,

$\begin{eqnarray}&&U{\varphi }_{\pm }{U}^{* }=(1-\gamma (1-p))| \pm {\theta }_{0}\rangle \langle \pm {\theta }_{0}| \otimes | 0\rangle \langle 0| +\gamma (1-p)| 1\rangle \langle 1| \otimes | 1\rangle \langle 1| ,\qquad \mathrm{with}\end{eqnarray} \tag{ 9 }$

$\begin{eqnarray}&&\cos {\theta }_{0}=\displaystyle \frac{1-2p-\gamma (1-p)}{1-\gamma (1-p)}.\end{eqnarray} \tag{ 10 }$

Each of these states is a CQ state with the first qubit pure and the second qubit classical, just as in a output. Given the second qubit, the first is either in the pure state $| \pm {\theta }_{0}\rangle$ corresponding to the channel input ±, or the state $| 1\rangle$ independently of the input; the latter is equivalent to $| {\theta }_{1}=0\rangle$ . Hence the decoder can begin just as at a step, measuring the second qubit to determine the angle associated to the first qubit.

The rate achievable by the quantum polar code construction is simply $R={\max }_{p\in [\mathrm{0,1}]}(1-H{(Z| B)}_{\psi }\,-H{(X| {BA}^{\prime} )}_{\xi })$ , where ${\psi }_{{ZB}}=p| 0\rangle \langle 0| \otimes {\rho }_{0}+(1-p)| 1\rangle \langle 1| \otimes {\rho }_{1}$ and ${\xi }_{{XBA}^{\prime} }=\tfrac{1}{2}{\sum }_{x\in \{\mathrm{0,1}\}}| x\rangle \langle x| \otimes {\varphi }_{x}$ . A cumbersome but straightforward calculation confirms that R equals the capacity of the channel, $C({{ \mathcal N }}_{\gamma })\,={\max }_{p\in [\mathrm{0,1}]}({h}_{2}((1-\gamma )p)-{h}_{2}(\gamma p))$ , for h₂ is the binary entropy [28, proposition 23.7.2]. Moreover, since the amplitude damping channel is degradable, the arguments in [16] ensure that no entanglement-assistance is required to meet the CSS constraint when constructing the quantum polar code.

5. Discussion

We have presented a BP algorithm for bitwise decoding of CQ channels which operates by passing quantum messages on tree factor graphs, and shown several applications to polar codes. This invites the study of quantum message passing algorithms, and not just in the context of decoding. More generally we may look for BP and related algorithms for any task of statistical inference where the input data comes in the form of many quantum bits, for instance in quantum metrology. This work also raises many interesting questions. Most immediately in the context of decoding is whether the complexity of the algorithm can be reduced for structured factor graphs. Classical polar codes, for instance, have decoding complexity $O(n\mathrm{log}n)$ . Can this also be achieved for the pure state channel? Similarly, can one find a quantum version of the max-product or Viterbi algorithm for determining the most likely x₁ⁿ given the channel outputs?

More generally, it would be very interesting to understand how to run the algorithm on a factor graph with loops, or how it can be modified to handle some set of non-pure output states. In the former case it may be useful to explore the characterization of loopy BP as a variational problem [1, 29]. Perhaps in the latter case one can make use of the work on quantum sufficiency (see e.g. [30, 31] and references therein) to find a suitable set of quantum messages for a given decoding problem.

Another interesting question with potentially far-reaching consequences is the relation of the BP algorithm to tensor network methods. The problem of marginalization in the commutative setting is explicitly treated as tensor network contraction in [14], and the particulars of the quantum BP decoder bear a similarity with the data gathering approach using tensor network states in [32]. Can the methods of approximating quantum states by tensor networks be used to create efficient approximate decoders?

Acknowledgments

It is a pleasure to acknowledge helpful conversations with Rüdiger Urbanke, Marco Mondelli, and David Sutter. Thanks also to Narayanan Rengaswamy for pointing out an error in in a previous version of this paper. This work was supported by the Swiss National Science Foundation (SNSF) via the National Centre of Competence in Research 'QSIT', and by the European Commission via the project 'RAQUEL'.

Belief propagation decoding of quantum channels by passing quantum messages

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. BP decoding on factor graphs

3. BP decoding of quantum outputs

4. Applications to polar codes

4.1. Polar codes for the pure state channel

4.2. Quantum polar codes for amplitude damping

5. Discussion

Acknowledgments

Footnotes

Belief propagation decoding of quantum channels by passing quantum messages

Article metrics

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. BP decoding on factor graphs

3. BP decoding of quantum outputs

4. Applications to polar codes

4.1. Polar codes for the pure state channel

4.2. Quantum polar codes for amplitude damping

5. Discussion

Acknowledgments

Footnotes