Modeling compositional dynamics based on GC and purine contents of protein-coding sequences.

Advanced Search

Zhang Zhang, Jun Yu

Author Information

Zhang Zhang: Plant Stress Genomics Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia.

PMID: 21059261 DOI: 10.1186/1745-6150-5-63

BACKGROUND: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.
RESULTS: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.
CONCLUSIONS: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.

Proc Natl Acad Sci U S A. 2010 Mar 9;107(10):4629-34 [PMID: 20176949]

Biochem Biophys Res Commun. 1999 Dec 9;266(1):66-71 [PMID: 10581166]

Proc Natl Acad Sci U S A. 1987 Jan;84(1):166-9 [PMID: 3467347]

Biochem Biophys Res Commun. 2004 Mar 19;315(4):1097-103 [PMID: 14985126]

Genome Biol. 2001;2(4):RESEARCH0010 [PMID: 11305938]

EMBO Rep. 2005 Dec;6(12):1208-13 [PMID: 16200051]

J Bacteriol. 1997 Jun;179(12):3899-913 [PMID: 9190805]

J Mol Evol. 1997 Mar;44(3):282-8 [PMID: 9060394]

Gene. 1992 Jan 2;110(1):81-8 [PMID: 1544580]

Genetics. 1994 Mar;136(3):927-35 [PMID: 8005445]

Genome Res. 2000 Dec;10(12):1986-95 [PMID: 11116093]

Genomics Proteomics Bioinformatics. 2007 Feb;5(1):1-6 [PMID: 17572358]

Nucleic Acids Res. 1982 Nov 25;10(22):7055-74 [PMID: 6760125]

Genetica. 1998;102-103(1-6):383-91 [PMID: 9720290]

Mol Biol Evol. 1985 Jan;2(1):13-34 [PMID: 3916708]

DNA Res. 2007 Aug 31;14(4):141-54 [PMID: 17895298]

Gene. 1997 Dec 31;205(1-2):269-78 [PMID: 9461401]

Res Microbiol. 2007 May;158(4):363-70 [PMID: 17449227]

J Mol Evol. 1994 May;38(5):468-75 [PMID: 8028025]

Annu Rev Genet. 2008;42:287-99 [PMID: 18983258]

J Mol Evol. 2001 Oct-Nov;53(4-5):290-8 [PMID: 11675589]

Proc Natl Acad Sci U S A. 1997 Sep 16;94(19):10227-32 [PMID: 9294192]

Biochem Biophys Res Commun. 2003 Jun 27;306(2):408-15 [PMID: 12804578]

Nucleic Acids Res. 2000 Jan 1;28(1):292 [PMID: 10592250]

Amino Acids. 2008 May;34(4):661-8 [PMID: 18180868]

Mol Biol Evol. 2000 Nov;17(11):1581-8 [PMID: 11070046]

Mol Biol Evol. 2009 Feb;26(2):255-71 [PMID: 18922761]

Proc Natl Acad Sci U S A. 2005 Mar 22;102(12):4442-7 [PMID: 15764708]

Genomics Proteomics Bioinformatics. 2007 Dec;5(3-4):143-51 [PMID: 18267295]

Mol Biol Evol. 2008 Mar;25(3):568-79 [PMID: 18178545]

J Mol Evol. 2003 Feb;56(2):151-61 [PMID: 12574861]

Nucleic Acids Res. 1987 Feb 11;15(3):1281-95 [PMID: 3547335]

Gene. 2001 Oct 3;276(1-2):47-56 [PMID: 11591471]

J Mol Biol. 2000 Sep 8;302(1):205-17 [PMID: 10964570]

Cold Spring Harb Symp Quant Biol. 1961;26:35-43 [PMID: 13918160]

Genetics. 1991 Nov;129(3):897-907 [PMID: 1752426]

Gene. 2008 Jan 15;407(1-2):30-41 [PMID: 17977670]

Genome Biol Evol. 2009 Aug 04;1:288-93 [PMID: 20333198]

Mol Biol Evol. 2004 Aug;21(8):1548-56 [PMID: 15140949]

J Mol Evol. 2004 Aug;59(2):258-66 [PMID: 15486699]

J Biol Chem. 1982 Mar 25;257(6):3026-31 [PMID: 7037777]

Curr Issues Mol Biol. 2001 Oct;3(4):91-7 [PMID: 11719972]

J Mol Evol. 1997 Nov;45(5):514-23 [PMID: 9342399]

Biochem Biophys Res Commun. 2007 Apr 27;356(1):20-5 [PMID: 17336933]

Genetics. 1997 Dec;147(4):1989-91 [PMID: 9409854]

Trends Genet. 1995 Jul;11(7):283-90 [PMID: 7482779]

J Mol Evol. 1988 Dec-1989 Feb;28(1-2):7-18 [PMID: 3148744]

Experientia. 1950 Jun 15;6(6):201-9 [PMID: 15421335]

Genome Res. 2001 Apr;11(4):540-6 [PMID: 11282969]

J Mol Evol. 2003 Nov;57(5):533-7 [PMID: 14738311]

Gene. 2000 Jan 4;241(1):3-17 [PMID: 10607893]

Philos Trans R Soc Lond B Biol Sci. 2010 Apr 27;365(1544):1203-12 [PMID: 20308095]

Genetics. 1996 Nov;144(3):1309-20 [PMID: 8913770]

J Mol Evol. 1993 Oct;37(4):441-56 [PMID: 8308912]

J Mol Biol. 1981 Sep 25;151(3):389-409 [PMID: 6175758]

Genome Biol. 2002 Sep 26;3(10):RESEARCH0058 [PMID: 12372146]

Mol Biol Evol. 2004 Jan;21(1):90-6 [PMID: 14595101]

Annu Rev Genet. 1998;32:185-225 [PMID: 9928479]

J Cell Physiol Suppl. 1951 Jul;38(Suppl. 1):41-59 [PMID: 14861276]

Mol Biol Evol. 2007 Feb;24(2):374-81 [PMID: 17101719]

Proc Natl Acad Sci U S A. 2004 Mar 9;101(10):3480-5 [PMID: 14990797]

J Mol Evol. 2004 Sep;59(3):400-15 [PMID: 15553093]

BMC Evol Biol. 2007 Nov 15;7:226 [PMID: 18005411]

J Mol Evol. 1986;24(1-2):1-11 [PMID: 3104608]

Nucleic Acids Res. 1989 Jan 25;17(2):477-98 [PMID: 2644621]

Mol Biol Evol. 2007 Feb;24(2):513-21 [PMID: 17119011]

Proc Natl Acad Sci U S A. 1962 Apr 15;48:582-92 [PMID: 13918161]

Genetics. 1993 Jul;134(3):847-58 [PMID: 8349115]

Proc Natl Acad Sci U S A. 1988 Apr;85(8):2653-7 [PMID: 3357886]

Amino Acids

Animals

Base Composition

Codon

Humans

Open Reading Frames

Purines

Amino Acids

Codon

Purines

purine

Journal Article Research Support, Non-U.S. Gov't

OpenLB
Open Library of Bioscience