What Signatures Dominantly Associate with Gene Age?

Hongyan Yin, Guangyu Wang, Lina Ma, Soojin V Yi, Zhang Zhang
Author Information
  1. Hongyan Yin: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China.
  2. Guangyu Wang: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China.
  3. Lina Ma: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China.
  4. Soojin V Yi: School of Biology, Georgia Institute of Technology, Atlanta.
  5. Zhang Zhang: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China zhangzhang@big.ac.cn.

Abstract

As genes originate at different evolutionary times, they harbor distinctive genomic signatures of evolutionary ages. Although previous studies have investigated different gene age-related signatures, what signatures dominantly associate with gene age remains unresolved. Here we address this question via a combined approach of comprehensive assignment of gene ages, gene family identification, and multivariate analyses. We first provide a comprehensive and improved gene age assignment by combining homolog clustering with phylogeny inference and categorize human genes into 26 age classes spanning the whole tree of life. We then explore the dominant age-related signatures based on a collection of 10 potential signatures (including gene composition, gene length, selection pressure, expression level, connectivity in protein-protein interaction network and DNA methylation). Our results show that GC content and connectivity in protein-protein interaction network (PPIN) associate dominantly with gene age. Furthermore, we investigate the heterogeneity of dominant signatures in duplicates and singletons. We find that GC content is a consistent primary factor of gene age in duplicates and singletons, whereas PPIN is more strongly associated with gene age in singletons than in duplicates. Taken together, GC content and PPIN are two dominant signatures in close association with gene age, exhibiting heterogeneity in duplicates and singletons and presumably reflecting complex differential interplays between natural selection and mutation.

Keywords

References

  1. Nature. 2007 Apr 5;446(7136):616 [PMID: 17410161]
  2. Nat Genet. 2002 Jun;31(2):205-9 [PMID: 12032571]
  3. Bioinformatics. 2009 Aug 1;25(15):1972-3 [PMID: 19505945]
  4. Mol Biol Evol. 2015 Jan;32(1):258-67 [PMID: 25312911]
  5. Genome Res. 2009 Oct;19(10):1752-9 [PMID: 19726446]
  6. Genetica. 2007 Oct;131(2):151-6 [PMID: 17160620]
  7. Nat Rev Genet. 2011 Jan;12(1):32-42 [PMID: 21102527]
  8. Mol Biol Evol. 2005 Mar;22(3):650-8 [PMID: 15537806]
  9. Genome Biol Evol. 2010 Jul 12;2:393-409 [PMID: 20624743]
  10. BMC Evol Biol. 2004 Jul 06;4:22 [PMID: 15238160]
  11. Mol Biol Evol. 2013 Apr;30(4):772-80 [PMID: 23329690]
  12. Science. 2015 Jan 23;347(6220):1260419 [PMID: 25613900]
  13. Proc Natl Acad Sci U S A. 2009 May 5;106(18):7273-80 [PMID: 19351897]
  14. PLoS Biol. 2010 Oct 05;8(10):null [PMID: 20957185]
  15. Genome Res. 2003 Sep;13(9):2178-89 [PMID: 12952885]
  16. PLoS Genet. 2015 Feb 06;11(2):e1004941 [PMID: 25659072]
  17. Annu Rev Genet. 2013;47:307-33 [PMID: 24050177]
  18. Nucleic Acids Res. 2016 May 19;44(9):4222-32 [PMID: 27085808]
  19. Nucleic Acids Res. 2013 Jan;41(Database issue):D377-86 [PMID: 23193289]
  20. Hum Mol Genet. 2012 Jan 1;21(1):46-56 [PMID: 21945885]
  21. Nat Rev Genet. 2008 Aug;9(8):605-18 [PMID: 18591983]
  22. Nat Rev Genet. 2011 Aug 31;12(10):692-702 [PMID: 21878963]
  23. PLoS Genet. 2015 Jul 15;11(7):e1005391 [PMID: 26177073]
  24. Genome Res. 2010 Nov;20(11):1574-81 [PMID: 20921233]
  25. Genome Res. 2002 Dec;12(12):1854-9 [PMID: 12466289]
  26. Am J Hum Genet. 2014 Dec 4;95(6):660-74 [PMID: 25480033]
  27. Genome Res. 2010 Oct;20(10):1313-26 [PMID: 20651121]
  28. Mol Biol Evol. 2009 Mar;26(3):603-12 [PMID: 19064677]
  29. BMC Evol Biol. 2007 Apr 04;7:53 [PMID: 17408474]
  30. Proc Natl Acad Sci U S A. 2014 Apr 22;111(16):5932-7 [PMID: 24711408]
  31. Proc Natl Acad Sci U S A. 2005 Mar 1;102(9):3192-7 [PMID: 15728374]
  32. Nat Genet. 2004 May;36(5):492-6 [PMID: 15107850]
  33. Mol Biol Evol. 2008 Dec;25(12):2699-707 [PMID: 18820252]
  34. Nat Rev Genet. 2003 Nov;4(11):865-75 [PMID: 14634634]
  35. Nucleic Acids Res. 2012 Jan;40(Database issue):D862-5 [PMID: 22067443]
  36. Trends Genet. 2012 Mar;28(3):101-9 [PMID: 22154475]
  37. J Mol Evol. 1995 Mar;40(3):308-17 [PMID: 7723057]
  38. BMC Bioinformatics. 2002 May 16;3:14 [PMID: 12028595]
  39. PLoS Genet. 2012 Sep;8(9):e1002942 [PMID: 23028352]
  40. PLoS Comput Biol. 2008 Feb 29;4(2):e1000015 [PMID: 18463707]
  41. Mol Biol Evol. 2016 May;33(5):1245-56 [PMID: 26758516]
  42. Mol Biol Evol. 2012 Jul;29(7):1703-6 [PMID: 22319151]
  43. Genome Biol. 2015 Oct 01;16:202 [PMID: 26424194]
  44. Trends Genet. 2005 Nov;21(11):602-7 [PMID: 16140417]
  45. Curr Opin Microbiol. 2003 Oct;6(5):498-505 [PMID: 14572543]
  46. Mol Biol Evol. 2015 Jan;32(1):216-28 [PMID: 25371429]
  47. Mol Biol Evol. 2005 Mar;22(3):598-606 [PMID: 15537804]
  48. BMC Bioinformatics. 2012 Mar 22;13:43 [PMID: 22435713]
  49. Bioinformatics. 2012 Mar 1;28(5):729-30 [PMID: 22253290]
  50. PLoS Biol. 2012;10(11):e1001420 [PMID: 23139640]
  51. Syst Biol. 2010 May;59(3):307-21 [PMID: 20525638]
  52. Genome Biol. 2006;7(5):R43 [PMID: 16723033]
  53. Cell Stem Cell. 2012 May 4;10(5):620-34 [PMID: 22560082]
  54. Nature. 1978 Feb 9;271(5645):501 [PMID: 622185]
  55. Genomics Proteomics Bioinformatics. 2006 Nov;4(4):259-63 [PMID: 17531802]
  56. Nature. 1986 May 15-21;321(6067):209-13 [PMID: 2423876]

Grants

  1. R01 MH103517/NIMH NIH HHS

MeSH Term

Base Composition
DNA Methylation
Evolution, Molecular
Gene Duplication
Genes, Dominant
Genetic Heterogeneity
Genome, Human
Humans
Protein Interaction Maps
Selection, Genetic