value
| - A global cross-discipline effort is ongoing to characterize the evolution of SARS-CoV-2 virus and generate reliable epidemiological models of its diffusion. To this end, phylogenomic approaches leverage accumulating genomic mutations as barcodes to track the evolutionary history of the virus and can benefit from the surge of sequences deposited in public databases. Yet, such methods typically rely on consensus sequences representing the dominant virus lineage, whereas a complex sublineage architecture is often observed within single hosts. Furthermore, most approaches do not account for variants accumulation processes and might produce inaccurate results in condition of limited sampling, as witnessed in most countries affected by the epidemics. We introduce VERSO (Viral Evolution ReconStructiOn), a new comprehensive framework for the characterization of viral evolution and transmission from sequencing data of viral genomes. our approach accounts for accumulation of clonal mutations and uncertainty in the data, by taking advantage of the achievements of research in cancer evolution, to deliver robust phylogenomic lineage models, and exploits intra-host variant frequency profiles to characterize the sublineage similarity among samples, which may derive from uncovered infection events. The application of our approach to RNA-sequencing data of 162 SARS-CoV-2 samples generates a high-resolution model of evolution and spread, which improves recent findings on viral types and highlights the existence of patterns of co-occurrence of minor variants, revealing likely infection paths among hosts harboring the same viral lineage. The in-depth analysis of the mutational landscape of SARS-CoV-2 confirms a statistically significant increase of genomic diversity in time and identifies a number of variants that are transiting from minor to clonal state in the population. We also show that standard phylogenetic methods can produce unreliable results when handling datasets with noise and sampling limitations, as proven by the further application of VERSO to 12419 consensus sequences included in GISAID database. Notably, VERSO allows to pinpoint minor variants that might be positively or negatively selected across distinct lineages, thus driving the design of treatments and vaccines. In particular, minor variant g.29039A>U, detected in multiple viral lineages and validated on independent samples, shows that SARS-CoV-2 can lose its main Nucleocapsid immunogenic epitopes, raising concerns about the effectiveness of vaccines targeting the C-terminus of this protein. Finally, we here release the likely SARS-CoV-2 ancestral genome, obtained by resolving ambiguous SNPs that distinguish two widely-used reference genomes from human samples, by employing the Pangolin-CoV and the Bat-CoV-RaTG13 genomes. Our results show that the joint application of our framework and data-driven epidemiological models might allow to deliver a high-resolution platform for pathogen surveillance and analysis.
|