Results
We verify the above three MDP modelings for 10 well-known quantum circuits, namely, circuits to generate 4 Bell states, SWAP gate, iSWAP gate, CZ gate, GHZ gate, Z gate, and Toffoli gate. For Matrix and Reverse Matrix Representations, we apply both Q-learning and DQN algorithms, while for TN Representation, we applied only Q-learning. Our codes can be found at this link:
https://github.com/YangletLiu/quantum-education-modules/tree/main/Quantum_Circuit_Design
Toffoli Gate: We used an action set with gates shown in Figure 1:
where the \(\text{CP}\) gate refers to a controlled-phase gate with a phase shift of \(\frac{\pi}{2}\), and \(\text{CP}^{-1}\) with a phase shift of \(-\frac{\pi}{2}\).
An expert trajectory was stored in the replay buffer to improve learning efficiency. For the Matrix Representation, the expert trajectory is:
For the Reverse Matrix Representation:
and the expert trajectory becomes:
Each state expands in a branching factor (size of actions) \(c\) across \(b + 1\) levels (length of the tasks +1). As shown in Figure 2, the size of the state space is given by a geometric series:
The complexity of the task is measured by the size of the state space, as shown in Table 1. To evaluate the effectiveness of Q-learning and DQN, we conducted 100 rounds. In each round, the agent was trained for 100 episodes, and we measured the success ratio (in percentage) of correct testing results over the 100 rounds. The results are summarized in Table 2.
From Table 2, we observe that both Q-learning and DQN perform well on simpler tasks, such as generating the Bell state \(|\Phi^+\rangle\). However, as task complexity increases, for example, in the iSWAP gate task with a state space size of \(5^6\), the performance of both algorithms significantly degrades, indicating the challenges of learning in large state spaces.
Task Name |
Qubits |
Actions |
Length |
Space Size |
Universal Gate Set |
|---|---|---|---|---|---|
Bell state \(|\Phi^+\rangle\) |
2 |
6 |
2 |
43 |
\(\{H, \text{CNOT}, T\}\) |
Bell state \(|\Phi^-\rangle\) |
2 |
6 |
3 |
259 |
\(\{H, \text{CNOT}, T, X\}\) |
Bell state \(|\Psi^+\rangle\) |
2 |
6 |
3 |
259 |
\(\{H, \text{CNOT}, T, X\}\) |
Bell state \(|\Psi^-\rangle\) |
2 |
8 |
5 |
37449 |
\(\{H, \text{CNOT}, T, X, Z\}\) |
SWAP gate |
2 |
6 |
3 |
259 |
\(\{H, \text{CNOT}, T\}\) |
iSWAP gate |
2 |
6 |
5 |
9331 |
\(\{H, \text{CNOT}, T\}\) |
CZ gate |
2 |
6 |
3 |
259 |
\(\{H, \text{CNOT}, T\}\) |
GHZ gate |
3 |
8 |
3 |
585 |
\(\{H, \text{CNOT}, T\}\) |
Z gate |
3 |
10 |
2 |
111 |
\(\{H, \text{CNOT}, T, S\}\) |
Toffoli gate |
3 |
5 |
7 |
97656 |
Special Case |
Gates |
Q-Learning |
Q-Learning (Reverse) |
DQN |
DQN (Reverse) |
Q-Learning (TN) |
|---|---|---|---|---|---|
Bell state \(|\Phi^+\rangle\) |
86% |
85% |
33% |
39% |
100% |
Bell state \(|\Phi^-\rangle\) |
41% |
25% |
18% |
20% |
94% |
Bell state \(|\Psi^+\rangle\) |
55% |
53% |
21% |
17% |
95% |
Bell state \(|\Psi^-\rangle\) |
5% |
4% |
6% |
4% |
15% |
SWAP gate |
10% |
15% |
21% |
27% |
3% |
iSWAP gate |
2% |
1% |
2% |
5% |
2% |
CZ gate |
69% |
77% |
16% |
17% |
19% |
GHZ gate |
34% |
17% |
13% |
20% |
45% |
Z gate |
50% |
38% |
17% |
19% |
13% |
Toffoli gate |
87% |
91% |
1% |
3% |
Wang, Z.; Feng, C.; Poon, C.; Huang, L.; Zhao, X.; Ma, Y.; Fu, T.; and Liu, X.-Y. 2025. Reinforcement learning for quantum circuit design: Using matrix representations. In arXiv, 2501.16509. https://arxiv.org/abs/2501.16509.