Population-specific genetic variation in large sequencing data sets: why more data is still better.
Jeroen G J van Rooij, Mila Jhamai, Pascal P Arp, Stephan C A Nouwens, Marijn Verkerk, Albert Hofman, M Arfan Ikram, Annemieke J Verkerk, Joyce B J van Meurs, Fernando Rivadeneira, André G Uitterlinden, Robert Kraaij
Author Information
Jeroen G J van Rooij: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
Mila Jhamai: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
Pascal P Arp: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
Stephan C A Nouwens: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
Marijn Verkerk: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands. ORCID
Albert Hofman: Department of Epidemiology, Erasmus MC, Rotterdam, Netherlands.
M Arfan Ikram: Department of Neurology, Erasmus MC, Rotterdam, Netherlands. ORCID
Annemieke J Verkerk: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
Joyce B J van Meurs: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
Fernando Rivadeneira: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands. ORCID
André G Uitterlinden: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
Robert Kraaij: Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands.
We have generated a next-generation whole-exome sequencing data set of 2628 participants of the population-based Rotterdam Study cohort, comprising 669 737 single-nucleotide variants and 24 019 short insertions and deletions. Because of broad and deep longitudinal phenotyping of the Rotterdam Study, this data set permits extensive interpretation of genetic variants on a range of clinically relevant outcomes, and is accessible as a control data set. We show that next-generation sequencing data sets yield a large degree of population-specific variants, which are not captured by other available large sequencing efforts, being ExAC, ESP, 1000G, UK10K, GoNL and DECODE.