VQC-based reinforcement learning with data re-uploading: performance and trainability

Rodrigo Coelho, A. Sequeira, Luís Paulo Santos·January 21, 2024·DOI: 10.1007/s42484-024-00190-z

Computer SciencePhysics

AI Breakdown

Get a structured breakdown of this paper — what it's about, the core idea, and key takeaways for the field.

Abstract

Reinforcement learning (RL) consists of designing agents that make intelligent decisions without human supervision. When used alongside function approximators such as Neural Networks (NNs), RL is capable of solving extremely complex problems. Deep Q-Learning, a RL algorithm that uses Deep NNs, has been shown to achieve super-human performance in game-related tasks. Nonetheless, it is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms. This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in classic control benchmark environments. More specifically, we research how data re-uploading affects both these metrics. We show that the magnitude and the variance of the model’s gradients remain substantial throughout training even as the number of qubits increases. In fact, both increase considerably in the training’s early stages, when the agent needs to learn the most. They decrease later in the training, when the agent should have done most of the learning and started converging to a policy. Thus, even if the probability of being initialized in a Barren Plateau increases exponentially with system size for Hardware-Efficient ansatzes, these results indicate that the VQC-based Deep Q-Learning models may still be able to find large gradients throughout training, allowing for learning.

View PDF View on arXiv

VQC-based reinforcement learning with data re-uploading: performance and trainability

AI Breakdown

Abstract

Related Research