Javier Lopez, Jacobo Coll, Matthias Haimel, Swaathi Kandasamy, Joaquin Tarraga, Pedro Furio-Tari, Wasim Bari, Marta Bleda, Antonio Rueda, Stefan Gräf, Augusto Rendon, Joaquin Dopazo, Ignacio Medina
Author Information
Javier Lopez: Genomics England, Charterhouse Square, London EC1M 6BQ, UK.
Jacobo Coll: Genomics England, Charterhouse Square, London EC1M 6BQ, UK.
Matthias Haimel: Department of Haematology, University of Cambridge, Cambridge CB2 0PT, UK.
Swaathi Kandasamy: Department of Haematology, University of Cambridge, Cambridge CB2 0PT, UK.
Joaquin Tarraga: HPC Service, UIS, University of Cambridge, Cambridge CB3 0FB, UK.
Pedro Furio-Tari: Genomics England, Charterhouse Square, London EC1M 6BQ, UK.
Wasim Bari: Genomics England, Charterhouse Square, London EC1M 6BQ, UK.
Marta Bleda: Department of Haematology, University of Cambridge, Cambridge CB2 0PT, UK.
Antonio Rueda: Genomics England, Charterhouse Square, London EC1M 6BQ, UK.
Stefan Gräf: Department of Haematology, University of Cambridge, Cambridge CB2 0PT, UK.
Augusto Rendon: Genomics England, Charterhouse Square, London EC1M 6BQ, UK.
Joaquin Dopazo: Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocio, Sevilla 41013, Spain.
Ignacio Medina: HPC Service, UIS, University of Cambridge, Cambridge CB3 0FB, UK.
High-profile genomic variation projects like the 1000 Genomes project or the Exome Aggregation Consortium, are generating a wealth of human genomic variation knowledge which can be used as an essential reference for identifying disease-causing genotypes. However, accessing these data, contrasting the various studies and integrating those data in downstream analyses remains cumbersome. The Human Genome Variation Archive (HGVA) tackles these challenges and facilitates access to genomic data for key reference projects in a clean, fast and integrated fashion. HGVA provides an efficient and intuitive web-interface for easy data mining, a comprehensive RESTful API and client libraries in Python, Java and JavaScript for fast programmatic access to its knowledge base. HGVA calculates population frequencies for these projects and enriches their data with variant annotation provided by CellBase, a rich and fast annotation solution. HGVA serves as a proof-of-concept of the genome analysis developments being carried out by the University of Cambridge together with UK's 100 000 genomes project and the National Institute for Health Research BioResource Rare-Diseases, in particular, deploying open-source for Computational Biology (OpenCB) software platform for storing and analyzing massive genomic datasets.