Introduction

Overview 

Quantum computing has the potential to revolutionize computing beyond the reach of classical computers. Currently, the manufacturing of quantum hardware is in the infancy stage, called the Noisy Intermediate-Scale Quantum (NISQ) era, meaning:

They can only support a limited number of qubits.
Gate operations can be imprecise (i.e., “noisy”).

Given these constraints, automated methods for designing circuits are increasingly important. Traditional “hand-crafted” circuits are often inefficient and not scalable, especially when we want to build more advanced or large-scale quantum algorithms.

In this module, we introduce how reinforcement learning (RL) can help automate quantum circuit design:

What is the task? We want to find the best sequence of gates that produces a desired quantum state or operation.
Why use RL? RL explores different possible circuits step by step—applying gates, observing the outcome, then adjusting its strategy over many trials.

We explore reinforcement learning methods to automate the task of quantum circuit search. Our contributions can be summarized as follows:

We present three generic Markov Decision Process (MDP) modelings for the quantum circuit design task.
We study \(10\) quantum circuit design tasks: \(4\) Bell states, SWAP gate, iSWAP gate, CZ gate, GHZ gate, Z gate and Toffoli gate, respectively, given a universal gate set { \(H, T, \text{CNOT}\) }.

Problem Formulation 

To make things clear, let’s consider a simple task: designing a circuit that creates the Bell state \(\ket{\Phi^+}\)

../../_images/bell_circuit.png — Fig. 1 A quantum circuit to generate Bell state \(\ket{\Phi^+}\).

Task: Quantum Circuit Design 

Given two qubits with initial state \(\ket{q_1q_0} = \ket{00}\) and a universal gate set \(G =\) { \(H, T, \text{CNOT}\) }, the goal is to find a quantum circuit that generates the Bell state \(\ket{\Phi^+}\):

(1)\[\ket{\Phi^+} = \frac{1}{\sqrt{2}} \left( \ket{00} + \ket{11} \right)\]

The target quantum circuit to generate \(\ket{\Phi^+}\) whose matrix representation is:

(2)\[\begin{split}U &= \text{CNOT}_{01} \cdot (H \otimes I) \\ &= \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{pmatrix} \cdot \left( \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1\\ 1 & -1 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0\\ 0 & 1\\ \end{pmatrix} \right)\\ &= \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & -1 \\ 1 & 0 & -1 & 0 \end{pmatrix}\end{split}\]

Note that \(\ket{\Phi^+} = U~\ket{00}\).

[Paper]

Wang, Z.; Feng, C.; Poon, C.; Huang, L.; Zhao, X.; Ma, Y.; Fu, T.; and Liu, X.-Y. 2025. Reinforcement learning for quantum circuit design: Using matrix representations. In arXiv, 2501.16509. https://arxiv.org/abs/2501.16509.

Introduction

Overview

Problem Formulation

Task: Quantum Circuit Design

Overview 

Problem Formulation 

Task: Quantum Circuit Design 