Pangenome-guided sequence assembly via binary optimisation
AI Breakdown
Get a structured breakdown of this paper — what it's about, the core idea, and key takeaways for the field.
Abstract
De novo genome assembly is challenging in highly repetitive regions; however, reference-guided assemblers often suffer from bias. We propose a framework for pangenome-guided sequence assembly, which can resolve short-read data in complex regions without bias towards a single reference genome. Our primary contribution is to frame the assembly as a graph traversal optimisation problem, which can be implemented classically or on a quantum computer. The workflow involves first annotating pangenome graphs with estimated copy numbers for each node, then finding a path on the graph that best explains those copy numbers. On simulated data, our approach significantly reduces the number of contigs compared to de novo assemblers. While they introduce a small increase in inaccuracies, such as false joins, our optimisation-based methods are competitive with current exhaustive search techniques. They are also designed to scale more efficiently as the problem size grows and will run effectively on future quantum computers; a small experiment on a real quantum device showcases this behaviour. Moreover, they are more resilient to noise in copy number estimation inherent in short-read-based assembly. We also develop novel tools for creating realistic synthetic pangenomes, aligning reads to pangenomes and for evaluating assembly quality.