Population-specific genetic variation in large sequencing data sets: why more data is still better.

Jeroen G J van Rooij, Mila Jhamai, Pascal P Arp, Stephan C A Nouwens, Marijn Verkerk, Albert Hofman, M Arfan Ikram, Annemieke J Verkerk, Joyce B J van Meurs, Fernando Rivadeneira, André G Uitterlinden, Robert Kraaij
Author Information
  1. Jeroen G J van Rooij: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
  2. Mila Jhamai: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
  3. Pascal P Arp: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
  4. Stephan C A Nouwens: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
  5. Marijn Verkerk: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands. ORCID
  6. Albert Hofman: Department of Epidemiology, Erasmus MC, Rotterdam, Netherlands.
  7. M Arfan Ikram: Department of Neurology, Erasmus MC, Rotterdam, Netherlands. ORCID
  8. Annemieke J Verkerk: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
  9. Joyce B J van Meurs: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
  10. Fernando Rivadeneira: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands. ORCID
  11. André G Uitterlinden: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
  12. Robert Kraaij: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.

Abstract

We have generated a next-generation whole-exome sequencing data set of 2628 participants of the population-based Rotterdam Study cohort, comprising 669 737 single-nucleotide variants and 24 019 short insertions and deletions. Because of broad and deep longitudinal phenotyping of the Rotterdam Study, this data set permits extensive interpretation of genetic variants on a range of clinically relevant outcomes, and is accessible as a control data set. We show that next-generation sequencing data sets yield a large degree of population-specific variants, which are not captured by other available large sequencing efforts, being ExAC, ESP, 1000G, UK10K, GoNL and DECODE.

References

  1. PLoS Genet. 2014 Jul 31;10(7):e1004494 [PMID: 25078778]
  2. Science. 2012 Jul 6;337(6090):64-9 [PMID: 22604720]
  3. Nat Genet. 2014 Aug;46(8):818-25 [PMID: 24974849]
  4. Nat Genet. 2015 May;47(5):435-44 [PMID: 25807286]
  5. Genome Res. 2010 Sep;20(9):1297-303 [PMID: 20644199]
  6. Bioinformatics. 2010 Mar 1;26(5):589-95 [PMID: 20080505]
  7. Nature. 2016 Aug 17;536(7616):285-91 [PMID: 27535533]
  8. Nature. 2010 Oct 28;467(7319):1061-73 [PMID: 20981092]
  9. Eur J Hum Genet. 2014 Feb;22(2):221-7 [PMID: 23714750]
  10. Nucleic Acids Res. 2010 Sep;38(16):e164 [PMID: 20601685]
  11. Eur J Epidemiol. 2015 Aug;30(8):661-708 [PMID: 26386597]

MeSH Term

Datasets as Topic
Genetic Predisposition to Disease
Genome-Wide Association Study
High-Throughput Nucleotide Sequencing
Humans
Polymorphism, Genetic
Sequence Analysis, DNA

Word Cloud

Created with Highcharts 10.0.0datasequencingsetvariantslargenext-generationRotterdamStudygeneticgeneratedwhole-exome2628participantspopulation-basedcohortcomprising669 737single-nucleotide24 019shortinsertionsdeletionsbroaddeeplongitudinalphenotypingpermitsextensiveinterpretationrangeclinicallyrelevantoutcomesaccessiblecontrolshowsetsyielddegreepopulation-specificcapturedavailableeffortsExACESP1000GUK10KGoNLDECODEPopulation-specificvariationsets:stillbetter

Similar Articles

Cited By