Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?

Mario Dos Reis, Ziheng Yang
Author Information
  1. Mario Dos Reis: Department of Genetics, Evolution, and Environment, University College London, London, WC1E 6BT, United Kingdom.

Abstract

Several studies have reported a negative correlation between estimates of the nonsynonymous to synonymous rate ratio (ω = dN/dS) and the sequence distance d in pairwise comparisons of the same gene from different species. That is, more divergent sequences produce smaller estimates of ω. Explanations for this negative correlation have included segregating nonsynonymous polymorphisms in closely related species and nonlinear dynamics of the ratio of two random variables. Here we study the statistical properties of the maximum-likelihood estimates of ω and d in pairwise alignments and explore the possibility that the negative correlation can be entirely explained by those properties. We show that the ω estimate is positively biased for small d and that the bias decreases with the increase of d. We also show that the estimates of ω and d are negatively correlated when ω < 1 and positively correlated when ω > 1. However, the bias in estimates of ω and the correlation between estimates of ω and d are not enough to explain the much stronger correlation observed in real data sets. We then explore the behavior of the estimates when the model is misspecified and suggest that the observed correlation may be due to protein-level selection that causes very different amino acids to be favored in different domains of the protein. Widely used models fail to account for such among-site heterogeneity and cause underestimation of the nonsynonymous rate and ω, with the bias being much stronger for distant sequences. We point out that tests of positive selection based on the ω ratio are invariant to the parameterization of the model and thus unaffected by bias in the ω estimates or the correlation between estimates of ω and d.

Keywords

References

  1. Genome Biol Evol. 2009 Aug 13;1:308-19 [PMID: 20333200]
  2. Mol Biol Evol. 2008 Mar;25(3):568-79 [PMID: 18178545]
  3. BMC Evol Biol. 2007 Feb 08;7 Suppl 1:S5 [PMID: 17288578]
  4. Nature. 1997 Jan 9;385(6612):151-4 [PMID: 8990116]
  5. Mol Biol Evol. 2008 Sep;25(9):1995-2007 [PMID: 18586695]
  6. J Theor Biol. 2006 Mar 21;239(2):226-35 [PMID: 16239014]
  7. Proc Natl Acad Sci U S A. 1999 Oct 26;96(22):12494-9 [PMID: 10535950]
  8. Mol Biol Evol. 2009 Nov;26(11):2595-603 [PMID: 19661199]
  9. Nature. 1977 May 19;267(5608):275-6 [PMID: 865622]
  10. Trends Ecol Evol. 2000 Dec 1;15(12):496-503 [PMID: 11114436]
  11. Mol Biol Evol. 1998 Jul;15(7):910-7 [PMID: 9656490]
  12. Genetics. 1992 Dec;132(4):1161-76 [PMID: 1459433]
  13. Mol Biol Evol. 2007 Aug;24(8):1586-91 [PMID: 17483113]
  14. Genetics. 2012 Mar;190(3):1101-15 [PMID: 22209901]
  15. Genetics. 2000 May;155(1):431-49 [PMID: 10790415]

MeSH Term

Animals
Bacteria
Base Sequence
DNA, Bacterial
DNA, Mitochondrial
Evolution, Molecular
Gene Frequency
Models, Genetic
Selection, Genetic
Sequence Alignment

Chemicals

DNA, Bacterial
DNA, Mitochondrial

Word Cloud

Created with Highcharts 10.0.0ωestimatescorrelationdnonsynonymousrateratiopairwisebiasnegativedifferentsequencesselectionsynonymoussequencedistancespeciesdivergentproducesmallerpropertiesexploreshowpositivelycorrelated1muchstrongerobservedmodelSeveralstudiesreported=dN/dScomparisonsgeneExplanationsincludedsegregatingpolymorphismscloselyrelatednonlineardynamicstworandomvariablesstudystatisticalmaximum-likelihoodalignmentspossibilitycanentirelyexplainedestimatebiasedsmalldecreasesincreasealsonegatively<>Howeverenoughexplainrealdatasetsbehaviormisspecifiedsuggestmaydueprotein-levelcausesaminoacidsfavoreddomainsproteinWidelyusedmodelsfailaccountamong-siteheterogeneitycauseunderestimationdistantpointtestspositivebasedinvariantparameterizationthusunaffectednonsynonymous/synonymousratioscomparisons?evolutionaryalignment

Similar Articles

Cited By