Introduction

Extensive genetic diversity in viral populations within infected hosts and the divergence of variants from existing reference genomes impede the analysis of deep viral sequencing data. A de novo population consensus assembly is valuable both as a single linear representation of the population and as a backbone on which intra-host variants can be accurately mapped. The availability of consensus assemblies and robustly mapped variants are crucial to the genetic study of viral disease progression, transmission dynamics, and viral evolution. Existing de novo assembly techniques fail to robustly assemble ultra-deep sequence data from genetically heterogeneous populations such as viruses into full-length genomes due to the presence of extensive genetic variability, contaminants, and variable sequence coverage.We present VICUNA, a de novo assembly algorithm suitable for generating consensus assemblies from genetically heterogeneous populations. We demonstrate its effectiveness on Dengue, Human Immunodeficiency and West Nile viral populations, representing a range of intra-host diversity. Compared to state-of-the-art assemblers designed for haploid or diploid systems, VICUNA recovers full-length consensus and captures insertion/deletion polymorphisms in diverse samples. Final assemblies maintain a high base calling accuracy. VICUNA program is publicly available at: http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/ viral-genomics-analysis-software.We developed VICUNA, a publicly available software tool, that enables consensus assembly of ultra-deep sequence derived from diverse viral populations. While VICUNA was developed for the analysis of viral populations, its application to other heterogeneous sequence data sets such as metagenomic or tumor cell population samples may prove beneficial in these fields of research.

Publications

  1. De novo assembly of highly diverse viral populations.
    Cite this
    Yang X, Charlebois P, Gnerre S, Coole MG, Lennon NJ, Levin JZ, Qu J, Ryan EM, Zody MC, Henn MR, 2012-09-01 - BMC genomics

Credits

  1. Xiao Yang
    Developer

    -, Broad Institute of MIT and Harvard, United States of America

  2. Patrick Charlebois
    Developer

  3. Sante Gnerre
    Developer

  4. Matthew G Coole
    Developer

  5. Niall J Lennon
    Developer

  6. Joshua Z Levin
    Developer

  7. James Qu
    Developer

  8. Elizabeth M Ryan
    Developer

  9. Michael C Zody
    Developer

  10. Matthew R Henn
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT005774
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++
User InterfaceTerminal Command Line
Download Count0
Submitted ByMatthew R Henn