A simple method for estimating genetic diversity in large populations from finite sample sizes.

Stanislav Bashalkhanov, Madhav Pandey, Om P Rajora
Author Information
  1. Stanislav Bashalkhanov: Canadian Genomics and Conservation Genetics Institute, University of New Brunswick, Faculty of Forestry and Environmental Management, Fredericton, NB, E3B 6C2, Canada. stanislav.bashalkhanov@unb.ca

Abstract

BACKGROUND: Sample size is one of the critical factors affecting the accuracy of the estimation of population genetic diversity parameters. Small sample sizes often lead to significant errors in determining the allelic richness, which is one of the most important and commonly used estimators of genetic diversity in populations. Correct estimation of allelic richness in natural populations is challenging since they often do not conform to model assumptions. Here, we introduce a simple and robust approach to estimate the genetic diversity in large natural populations based on the empirical data for finite sample sizes.
RESULTS: We developed a non-linear regression model to infer genetic diversity estimates in large natural populations from finite sample sizes. The allelic richness values predicted by our model were in good agreement with those observed in the simulated data sets and the true allelic richness observed in the source populations. The model has been validated using simulated population genetic data sets with different evolutionary scenarios implied in the simulated populations, as well as large microsatellite and allozyme experimental data sets for four conifer species with contrasting patterns of inherent genetic diversity and mating systems. Our model was a better predictor for allelic richness in natural populations than the widely-used Ewens sampling formula, coalescent approach, and rarefaction algorithm.
CONCLUSIONS: Our regression model was capable of accurately estimating allelic richness in natural populations regardless of the species and marker system. This regression modeling approach is free from assumptions and can be widely used for population genetic and conservation applications.

References

  1. Bioinformatics. 2012 Oct 1;28(19):2537-9 [PMID: 22820204]
  2. Proc Natl Acad Sci U S A. 2006 Aug 15;103(33):12447-50 [PMID: 16894151]
  3. J Hered. 2006 Sep-Oct;97(5):483-92 [PMID: 16987938]
  4. Mol Ecol. 2002 Nov;11(11):2445-9 [PMID: 12406254]
  5. Mol Ecol. 2005 Jan;14(1):9-17 [PMID: 15643947]
  6. Biometrics. 1980 Dec;36(4):643-52 [PMID: 7248433]
  7. Theor Popul Biol. 1972 Mar;3(1):87-112 [PMID: 4667078]
  8. Am J Bot. 2007 Jun;94(6):991-8 [PMID: 21636468]
  9. Mol Ecol. 2000 Mar;9(3):339-48 [PMID: 10736031]
  10. Heredity (Edinb). 2006 Dec;97(6):418-26 [PMID: 16912700]
  11. J Hered. 2001 May-Jun;92(3):301-2 [PMID: 11447253]
  12. Proc Natl Acad Sci U S A. 2001 Apr 10;98(8):4563-8 [PMID: 11287657]

MeSH Term

Alleles
Finite Element Analysis
Genetic Techniques
Genetic Variation
Models, Genetic
Nonlinear Dynamics
Pinaceae
Regression Analysis
Sample Size

Word Cloud

Created with Highcharts 10.0.0populationsgeneticdiversityallelicrichnessmodelnaturalsamplesizeslargedatapopulationapproachfiniteregressionsimulatedsetsoneestimationoftenusedassumptionssimpleobservedspeciesestimatingBACKGROUND:SamplesizecriticalfactorsaffectingaccuracyparametersSmallleadsignificanterrorsdeterminingimportantcommonlyestimatorsCorrectchallengingsinceconformintroducerobustestimatebasedempiricalRESULTS:developednon-linearinferestimatesvaluespredictedgoodagreementtruesourcevalidatedusingdifferentevolutionaryscenariosimpliedwellmicrosatelliteallozymeexperimentalfourconifercontrastingpatternsinherentmatingsystemsbetterpredictorwidely-usedEwenssamplingformulacoalescentrarefactionalgorithmCONCLUSIONS:capableaccuratelyregardlessmarkersystemmodelingfreecanwidelyconservationapplicationsmethod

Similar Articles

Cited By