Detection of Regional Variation in Selection Intensity within Protein-Coding Genes Using DNA Sequence Polymorphism and Divergence.

Advanced Search

Zi-Ming Zhao, Michael C Campbell, Ning Li, Daniel S W Lee, Zhang Zhang, Jeffrey P Townsend

Author Information

Zi-Ming Zhao: Department of Biostatistics, Yale University, New Haven, CT.
Michael C Campbell: Department of Biostatistics, Yale University, New Haven, CT.
Ning Li: Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT.
Daniel S W Lee: Department of Biostatistics, Yale University, New Haven, CT.
Zhang Zhang: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
Jeffrey P Townsend: Department of Biostatistics, Yale University, New Haven, CT.

PMID: 28962009 DOI: 10.1093/molbev/msx213

Numerous approaches have been developed to infer natural selection based on the comparison of polymorphism within species and divergence between species. These methods are especially powerful for the detection of uniform selection operating across a gene. However, empirical analyses have demonstrated that regions of protein-coding genes exhibiting clusters of amino acid substitutions are subject to different levels of selection relative to other regions of the same gene. To quantify this heterogeneity of selection within coding sequences, we developed Model Averaged Site Selection via Poisson Random Field (MASS-PRF). MASS-PRF identifies an ensemble of intragenic clustering models for polymorphic and divergent sites. This ensemble of models is used within the Poisson Random Field framework to estimate selection intensity on a site-by-site basis. Using simulations, we demonstrate that MASS-PRF has high power to detect clusters of amino acid variants in small genic regions, can reliably estimate the probability of a variant occurring at each nucleotide site in sequence data and is robust to historical demographic trends and recombination. We applied MASS-PRF to human gene polymorphism derived from the 1,000 Genomes Project and divergence data from the common chimpanzee. On the basis of this analysis, we discovered striking regional variation in selection intensity, indicative of positive or negative selection, in well-defined domains of genes that have previously been associated with neurological processing, immunity, and reproduction. We suggest that amino acid-altering substitutions within these regions likely are or have been selectively advantageous in the human lineage, playing important roles in protein function.

Poisson Random Field divergence human evolution model averaged site selection natural selection polymorphism

Genetics. 1989 Nov;123(3):585-95 [PMID: 2513255]

PLoS Comput Biol. 2012;8(12):e1002806 [PMID: 23236270]

FEBS Lett. 2014 Jan 21;588(2):298-307 [PMID: 24157357]

Mol Biol Evol. 2002 Jan;19(1):49-57 [PMID: 11752189]

PLoS Genet. 2005 Sep;1(3):e35 [PMID: 16170411]

Genetics. 2001 Dec;159(4):1779-88 [PMID: 11779814]

Genetics. 2007 Aug;176(4):2451-63 [PMID: 17603100]

Mol Biol Evol. 2007 Dec;24(12):2687-97 [PMID: 18000010]

PLoS One. 2010 Oct 14;5(10):e13342 [PMID: 20976280]

Trends Genet. 2006 Aug;22(8):437-46 [PMID: 16808986]

Genetics. 1995 Feb;139(2):1067-76 [PMID: 7713409]

Genome Biol Evol. 2014 Dec 21;7(1):136-53 [PMID: 25532814]

Genetics. 1994 Mar;136(3):927-35 [PMID: 8005445]

Genetics. 1998 Mar;148(3):929-36 [PMID: 9539414]

Exp Mol Med. 2010 Apr 30;42(4):310-8 [PMID: 20177144]

Nature. 1991 Jun 20;351(6328):652-4 [PMID: 1904993]

Hum Mol Genet. 2004 Oct 1;13 Spec No 2:R245-54 [PMID: 15358731]

JAKSTAT. 2013 Oct 1;2(4):e27521 [PMID: 24498542]

Mol Biol Evol. 2002 Nov;19(11):1973-80 [PMID: 12411606]

Genetics. 2005 Apr;169(4):2023-34 [PMID: 15716507]

PLoS Genet. 2008 Aug 01;4(8):e1000144 [PMID: 18670650]

J Clin Invest. 2011 Dec;121(12):4633-6 [PMID: 22105165]

Trends Genet. 2005 May;21(5):256-9 [PMID: 15851058]

Infect Genet Evol. 2009 Jul;9(4):656-70 [PMID: 19442589]

Proc Natl Acad Sci U S A. 2007 Apr 17;104(16):6504-10 [PMID: 17409186]

Nat Rev Genet. 2007 Nov;8(11):857-68 [PMID: 17943193]

Genetics. 2002 Dec;162(4):2017-24 [PMID: 12524367]

Nature. 2000 Jan 20;403(6767):304-9 [PMID: 10659848]

Nature. 1998 Dec 10;396(6711):572-5 [PMID: 9859991]

Nature. 2005 Oct 20;437(7062):1153-7 [PMID: 16237444]

Genetics. 2010 Dec;186(4):1411-24 [PMID: 20923980]

PLoS Genet. 2008 Jan;4(1):e21 [PMID: 18225958]

Genetics. 1994 Nov;138(3):741-56 [PMID: 7851771]

Proc Natl Acad Sci U S A. 1992 Jun 1;89(11):4835-9 [PMID: 1594583]

Annu Rev Genet. 2013;47:97-120 [PMID: 24274750]

Mol Biol Evol. 2012 Apr;29(4):1167-74 [PMID: 22101416]

Genetics. 1992 Dec;132(4):1161-76 [PMID: 1459433]

Mol Biol Evol. 2002 Jun;19(6):908-17 [PMID: 12032247]

Nat Genet. 2007 Dec;39(12):1461-8 [PMID: 17987029]

PLoS Genet. 2008 Dec;4(12):e1000304 [PMID: 19081788]

EMBO J. 2004 Oct 13;23(20):3929-38 [PMID: 15385955]

Genes Dev. 2008 Dec 1;22(23):3349-62 [PMID: 19056886]

Neurobiol Dis. 2011 Jul;43(1):184-9 [PMID: 21420493]

Mol Biol Evol. 2007 Aug;24(8):1783-91 [PMID: 17533174]

Bioinformatics. 2013 Sep 15;29(18):2238-44 [PMID: 23884480]

Mol Aspects Med. 2013 Apr-Jun;34(2-3):220-35 [PMID: 23506867]

Annu Rev Genet. 1998;32:415-35 [PMID: 9928486]

J Biol Chem. 2007 Apr 20;282(16):12164-75 [PMID: 17289678]

Curr Biol. 2006 Mar 21;16(6):580-5 [PMID: 16546082]

Genetics. 2000 Jul;155(3):1405-13 [PMID: 10880498]

PLoS Biol. 2006 Mar;4(3):e72 [PMID: 16494531]

Mol Biol Evol. 2009 Aug;26(8):1879-88 [PMID: 19423664]

PLoS Comput Biol. 2009 Jun;5(6):e1000421 [PMID: 19557160]

PLoS One. 2013 Apr 08;8(4):e60123 [PMID: 23593168]

Philos Trans R Soc Lond B Biol Sci. 2012 Aug 5;367(1599):2091-6 [PMID: 22734052]

Genetics. 2010 Feb;184(2):429-37 [PMID: 19933876]

Mol Biol Evol. 2009 Mar;26(3):691-8 [PMID: 19126864]

PLoS Genet. 2014 Nov 06;10(11):e1004697 [PMID: 25375159]

Proc Natl Acad Sci U S A. 2002 Dec 10;99(25):16134-7 [PMID: 12461171]

PLoS Genet. 2011 Nov;7(11):e1002355 [PMID: 22072984]

Heredity (Edinb). 2001 Jun;86(Pt 6):641-7 [PMID: 11595044]

Nucleic Acids Res. 2013 Feb 1;41(4):2073-94 [PMID: 23293005]

Hum Mutat. 2009 May;30(5):787-94 [PMID: 19319927]

Genetics. 2003 Aug;164(4):1471-80 [PMID: 12930753]

Proc Natl Acad Sci U S A. 2004 Jul 6;101(27):9976-81 [PMID: 15213327]

Front Psychol. 2016 Jun 08;7:857 [PMID: 27375535]

J Theor Biol. 2006 Mar 21;239(2):226-35 [PMID: 16239014]

Mol Biol Evol. 1990 Nov;7(6):515-24 [PMID: 2283951]

Genetics. 2008 Dec;180(4):2175-91 [PMID: 18854590]

Curr Opin Genet Dev. 2014 Dec;29:120-32 [PMID: 25461616]

PLoS Genet. 2009 Oct;5(10):e1000698 [PMID: 19851448]

PLoS Comput Biol. 2006 Apr;2(4):e38 [PMID: 16683019]

Annu Rev Genet. 2005;39:197-218 [PMID: 16285858]

Mol Biol Evol. 1986 Sep;3(5):418-26 [PMID: 3444411]

Annu Rev Genomics Hum Genet. 2000;1:539-59 [PMID: 11701640]

Nucleic Acids Res. 2006 May 08;34(8):2428-37 [PMID: 16682450]

Nature. 2002 Oct 24;419(6909):832-7 [PMID: 12397357]

BMC Genomics. 2014 Jul 16;15:599 [PMID: 25030307]

Genetics. 2005 Jul;170(3):1411-21 [PMID: 15879513]

Genetics. 1987 May;116(1):153-9 [PMID: 3110004]

J Cell Biol. 2011 Mar 21;192(6):959-68 [PMID: 21402792]

Proc Natl Acad Sci U S A. 2007 May 1;104(18):7489-94 [PMID: 17449636]

Bioinformatics. 2002 Feb;18(2):337-8 [PMID: 11847089]

Am J Med Genet B Neuropsychiatr Genet. 2012 Mar;159B(2):152-71 [PMID: 22241550]

Biochem J. 2015 Mar 15;466(3):511-24 [PMID: 25564224]

S10 RR029676/NCRR NIH HHS

Algorithms

Amino Acid Substitution

Animals

Cluster Analysis

Evolution, Molecular

Exons

Genetic Variation

Humans

Models, Genetic

Open Reading Frames

Polymorphism, Genetic

Polymorphism, Single Nucleotide

Selection, Genetic

Sequence Analysis, DNA

Journal Article

OpenLB
Open Library of Bioscience