Compare Papers

Paper 1

Repeat-Until-Success: Non-deterministic decomposition of single-qubit unitaries

Adam Paetznick, Krysta M. Svore

Year
2013
Journal
arXiv preprint
DOI
arXiv:1311.1074
arXiv
1311.1074

We present a decomposition technique that uses non-deterministic circuits to approximate an arbitrary single-qubit unitary to within distance $ε$ and requires significantly fewer non-Clifford gates than existing techniques. We develop "Repeat-Until-Success" (RUS) circuits and characterize unitaries that can be exactly represented as an RUS circuit. Our RUS circuits operate by conditioning on a given measurement outcome and using only a small number of non-Clifford gates and ancilla qubits. We construct an algorithm based on RUS circuits that approximates an arbitrary single-qubit $Z$-axis rotation to within distance $ε$, where the number of $T$ gates scales as $1.26\log_2(1/ε) - 3.53$, an improvement of roughly three-fold over state-of-the-art techniques. We then extend our algorithm and show that a scaling of $2.4\log_2(1/ε) - 3.28$ can be achieved for arbitrary unitaries and a small range of $ε$, which is roughly twice as good as optimal deterministic decomposition methods.

Open paper

Paper 2

A Scalable FPGA Architecture for Real-Time Decoding of Quantum LDPC Codes Using GARI

Daniel Báscones, Arshpreet Singh Maan, Valentin Savin, Francisco Garcia-Herrero

Year
2026
Journal
arXiv preprint
DOI
arXiv:2605.01035
arXiv
2605.01035

In this work, we introduce a new hardware architecture for decoding correlated errors in quantum LDPC codes. The decoder is based on message passing and exploits the structure of the detector error model obtained through the recently introduced Graph Augmentation and Rewiring for Inference (GARI) method. The proposed architecture enables flexible scaling and can, in principle, adapt to any quantum LDPC codes using the GARI framework. It leverages resource reuse while maintaining a modest degree of parallelism, thereby reducing power consumption and area requirements, while preserving low decoding latency. As a case study, the architecture was implemented on a VCU19P FPGA as an ensemble of three decoder cores targeting the [[144,12,12]] bivariate bicycle code, achieving an average latency of 596 ns per decoding round. This implementation consumes six times fewer resources than the previous GARI-based proposal, being the first reported implementation of multiple decoder cores for correlated errors on a single FPGA device. This enables better energy-conscious scaling of the quantum error correction layer on the classical side, reducing overall power consumption while meeting real-time constraints without compromising decoding accuracy under correlated errors.

Open paper