Encoding Numerical Data for Generative Quantum Machine Learning
AI Breakdown
Get a structured breakdown of this paper — what it's about, the core idea, and key takeaways for the field.
Abstract
Generative quantum machine learning models are trained to deduce the probability distribution underlying a given dataset, and to produce new, synthetic samples from it. The majority of such models proposed in the literature, like the Quantum Circuit Born Machine (QCBM), fundamentally work on a binary level. Real-world data, however, is often numeric, requiring the models to translate between binary and continuous representations. We analyze how this transition influences the performance of quantum models and show that it requires the models to learn correlations that are solely an artifact of the way the data is encoded, and not related to the data itself. At the same time, structure of the original data can be obscured in the binary representation, hindering generalization. To mitigate these effects, we propose a strategy based on Gray-codes that can be implemented with essentially no overhead, conserves structures in the data, and avoids artificial correlations in situations in which the standard approach creates them. Considering datasets drawn from various one-dimensional probability distributions, we verify that, in most cases, QCBMs using the reflected Gray code learn faster and more accurately than those with standard binary code.