Introduction

By examining the genotype calls generated by the 1000 Genomes Project we discovered that the human reference genome GRCh37 contains almost 20,000 loci in which the reference allele has never been observed in healthy individuals and around 70,000 loci in which it has been observed only in the heterozygous state.We show that a large fraction of this rare reference allele (RRA) loci belongs to coding, functional and regulatory elements of the genome and could be linked to rare Mendelian disorders as well as cancer. We also demonstrate that classical germline and somatic variant calling tools are not capable to recognize the rare allele when present in these loci. To overcome such limitations, we developed a novel tool, named RAREVATOR, that is able to identify and call the rare allele in these genomic positions. By using a small cancer dataset we compared our tool with two state-of-the-art callers and we found that RAREVATOR identified more than 1,500 germline and 22 somatic RRA variants missed by the two methods and which belong to significantly mutated pathways.These results show that, to date, the investigation of around 100,000 loci of the human genome has been missed by re-sequencing experiments based on the GRCh37 assembly and that our tool can fill the gap left by other methods. Moreover, the investigation of the latest version of the human reference genome, GRCh38, showed that although the GRC corrected almost all insertions and a small part of SNVs and deletions, a large number of functionally relevant RRAs still remain unchanged. For this reason, also future resequencing experiments, based on GRCh38, will benefit from RAREVATOR analysis results. RAREVATOR is freely available at http://sourceforge.net/projects/rarevator .

Publications

  1. Characterization and identification of hidden rare variants in the human genome.
    Cite this
    Magi A, D'Aurizio R, Palombo F, Cifola I, Tattini L, Semeraro R, Pippucci T, Giusti B, Romeo G, Abbate R, Gensini GF, 2015-01-01 - BMC genomics

Credits

  1. Alberto Magi
    Developer

    Department of Experimental and Clinical Medicine, University of Florence

  2. Romina D'Aurizio
    Developer

    Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology

  3. Flavia Palombo
    Developer

    Medical Genetics Unit, Department of Medical and Surgical Sciences

  4. Ingrid Cifola
    Developer

    Institute for Biomedical Technologies, National Research Council, Italy

  5. Lorenzo Tattini
    Developer

    Department of Neuroscience, Pharmacology and Child Health

  6. Roberto Semeraro
    Developer

    Department of Experimental and Clinical Medicine, University of Florence

  7. Tommaso Pippucci
    Developer

    Medical Genetics Unit, Department of Medical and Surgical Sciences

  8. Betti Giusti
    Developer

    Department of Experimental and Clinical Medicine, University of Florence

  9. Giovanni Romeo
    Developer

    Medical Genetics Unit, Department of Medical and Surgical Sciences

  10. Rosanna Abbate
    Developer

    Department of Experimental and Clinical Medicine, University of Florence

  11. Gian Franco Gensini
    Investigator

    Department of Experimental and Clinical Medicine, University of Florence

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT001237
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesPerl
User InterfaceTerminal Command Line
Download Count0
Submitted ByGian Franco Gensini