Detection and removal of PCR duplicates in population genomic ddRAD studies by addition of a degenerate base region (DBR) in sequencing adapters.

Hannah Schweyen, Andrey Rozenberg, Florian Leese
Author Information
  1. Hannah Schweyen: Ruhr University Bochum, Department of Animal Ecology, Evolution and Biodiversity, Universitaetsstrasse 150, D-44801 Bochum, Germany.
  2. Andrey Rozenberg: Ruhr University Bochum, Department of Animal Ecology, Evolution and Biodiversity, Universitaetsstrasse 150, D-44801 Bochum, Germany.
  3. Florian Leese: Ruhr University Bochum, Department of Animal Ecology, Evolution and Biodiversity, Universitaetsstrasse 150, D-44801 Bochum, Germany florian.leese@rub.de.

Abstract

Restriction-site associated DNA sequencing (RAD) has emerged as a powerful marker system for studying genome-wide DNA polymorphisms using next-generation sequencing. A recent technical facilitation of RAD is double-digest RAD (ddRAD), which utilizes two restriction enzymes for library preparation. The more flexible and balanced ddRAD allows analysis of genomic loci in hundreds of individuals. However, in contrast to paired-end sequencing of traditional RAD libraries, PCR duplicates cannot be detected with ddRAD. This is a concern because duplicates can contribute substantially to read coverage data and erroneously inflate the proportion of homozygous loci (allele dropout). Allele dropout can bias population genetic parameter inference and complicate the detection of outlier loci under selection. Here we outline a simple and straightforward approach to detecting PCR duplicates from ddRAD libraries. Our approach introduces a degenerate base region (DBR, 12,288 unique combinations) in the sequencing adapter. We demonstrate the high efficiency and low rate of false positives in simulations. In addition, a pilot study was performed to test this approach on six aquatic invertebrates, sequenced on a HiSeq 2500 sequencer. The reads of the ddRAD libraries consisted of 33.48% PCR duplicates distributed on 19.40% of the loci. A disproportionate number of PCR duplicates were detected in only 4.66% of the loci. While this should not be a concern for general parameter inference, outlier loci detection in particular would be improved by the DBR technique. Given the easy and straightforward application of the technique in other RAD protocols as well, we suggest that DBR regions should generally be included in PCR-based RAD studies.

MeSH Term

Animals
DNA Restriction Enzymes
Invertebrates
Metagenomics
Polymerase Chain Reaction
Polymorphism, Genetic
Sequence Analysis, DNA

Chemicals

DNA Restriction Enzymes

Word Cloud

Created with Highcharts 10.0.0RADddRADlociduplicatessequencingPCRDBRlibrariesapproachDNAgenomicdetectedconcerncandropoutpopulationparameterinferencedetectionoutlierstraightforwarddegeneratebaseregionadditiontechniquestudiesRestriction-siteassociatedemergedpowerfulmarkersystemstudyinggenome-widepolymorphismsusingnext-generationrecenttechnicalfacilitationdouble-digestutilizestworestrictionenzymeslibrarypreparationflexiblebalancedallowsanalysishundredsindividualsHowevercontrastpaired-endtraditionalcontributesubstantiallyreadcoveragedataerroneouslyinflateproportionhomozygousalleleAllelebiasgeneticcomplicateselectionoutlinesimpledetectingintroduces12288uniquecombinationsadapterdemonstratehighefficiencylowratefalsepositivessimulationspilotstudyperformedtestsixaquaticinvertebratessequencedHiSeq2500sequencerreadsconsisted3348%distributed1940%disproportionatenumber466%generalparticularimprovedGiveneasyapplicationprotocolswellsuggestregionsgenerallyincludedPCR-basedDetectionremovaladapters

Similar Articles

Cited By