Characterization of paralogous protein families in rice.

Haining Lin, Shu Ouyang, Rain Simons, Kan Nobuta, Brian J Haas, Wei Zhu, Xun Gu, Joana C Silva, Blake C Meyers, C Robin Buell
Author Information
  1. Haining Lin: The Institute for Genomic Research, 9712 Medical Center Dr. , Rockville, MD 20850, USA. linha@msu.edu

Abstract

BACKGROUND: High gene numbers in plant genomes reflect polyploidy and major gene duplication events. Oryza sativa, cultivated rice, is a diploid monocotyledonous species with a ~390 Mb genome that has undergone segmental duplication of a substantial portion of its genome. This, coupled with other genetic events such as tandem duplications, has resulted in a substantial number of its genes, and resulting proteins, occurring in paralogous families.
RESULTS: Using a computational pipeline that utilizes Pfam and novel protein domains, we characterized paralogous families in rice and compared these with paralogous families in the model dicotyledonous diploid species, Arabidopsis thaliana. Arabidopsis, which has undergone genome duplication as well, has a substantially smaller genome (~120 Mb) and gene complement compared to rice. Overall, 53% and 68% of the non-transposable element-related rice and Arabidopsis proteins could be classified into paralogous protein families, respectively. Singleton and paralogous family genes differed substantially in their likelihood of encoding a protein of known or putative function; 26% and 66% of singleton genes compared to 73% and 96% of the paralogous family genes encode a known or putative protein in rice and Arabidopsis, respectively. Furthermore, a major skew in the distribution of specific gene function was observed; a total of 17 Gene Ontology categories in both rice and Arabidopsis were statistically significant in their differential distribution between paralogous family and singleton proteins. In contrast to mammalian organisms, we found that duplicated genes in rice and Arabidopsis tend to have more alternative splice forms. Using data from Massively Parallel Signature Sequencing, we show that a significant portion of the duplicated genes in rice show divergent expression although a correlation between sequence divergence and correlation of expression could be seen in very young genes.
CONCLUSION: Collectively, these data suggest that while co-regulation and conserved function are present in some paralogous protein family members, evolutionary pressures have resulted in functional divergence with differential expression patterns.

References

  1. Mol Biol Evol. 1994 Sep;11(5):725-36 [PMID: 7968486]
  2. Funct Integr Genomics. 2007 Jan;7(1):1-16 [PMID: 16897088]
  3. J Mol Evol. 2005 Feb;60(2):247-56 [PMID: 15785853]
  4. J Mol Biol. 1985 Jun 5;183(3):499-502 [PMID: 4020867]
  5. Genome Res. 2006 Feb;16(2):182-9 [PMID: 16365379]
  6. Plant Physiol. 1995 Dec;109(4):1491-1495 [PMID: 12228685]
  7. Bioinformatics. 1998;14(9):755-63 [PMID: 9918945]
  8. BMC Biol. 2005 Mar 22;3:7 [PMID: 15784138]
  9. Plant Cell. 2003 Sep;15(9):2192-202 [PMID: 12953120]
  10. Nucleic Acids Res. 1994 Nov 11;22(22):4673-80 [PMID: 7984417]
  11. Trends Genet. 2004 Mar;20(3):116-22 [PMID: 15049302]
  12. Proc Natl Acad Sci U S A. 2000 Apr 11;97(8):4168-73 [PMID: 10759555]
  13. Bioinformatics. 2004 Dec 12;20(18):3643-6 [PMID: 15247098]
  14. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41 [PMID: 14681378]
  15. Nature. 2003 Jan 2;421(6918):63-6 [PMID: 12511954]
  16. Genome Res. 2007 Feb;17(2):175-83 [PMID: 17210932]
  17. Carcinogenesis. 2005 Jul;26(7):1296-306 [PMID: 15746161]
  18. Nat Genet. 2005 Jun;37(6):588-9 [PMID: 15895079]
  19. Curr Med Chem Anticancer Agents. 2003 Sep;3(5):360-3 [PMID: 12871082]
  20. Biochem J. 1963 May;87:281-4 [PMID: 13968438]
  21. Genetics. 2000 Jan;154(1):459-73 [PMID: 10629003]
  22. Mol Genet Genomics. 2006 Dec;276(6):565-75 [PMID: 17033811]
  23. Mol Genet Genomics. 2005 Jun;273(5):423-32 [PMID: 15887031]
  24. J Agric Food Chem. 2004 Apr 21;52(8):2242-6 [PMID: 15080628]
  25. New Phytol. 2005 Mar;165(3):937-46 [PMID: 15720704]
  26. Nucleic Acids Res. 2007 Jan;35(Database issue):D883-7 [PMID: 17145706]
  27. Nucleic Acids Res. 2003 Jul 1;31(13):3497-500 [PMID: 12824352]
  28. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  29. Brief Bioinform. 2004 Jun;5(2):150-63 [PMID: 15260895]
  30. Biochim Biophys Acta. 2005 Sep 25;1730(3):253-8 [PMID: 16081169]
  31. Plant Physiol. 2003 Oct;133(2):560-70 [PMID: 12972663]
  32. Nucleic Acids Res. 2003 Oct 1;31(19):5654-66 [PMID: 14500829]
  33. Genetics. 1999 Apr;151(4):1531-45 [PMID: 10101175]
  34. Proc Natl Acad Sci U S A. 2005 Jan 18;102(3):707-12 [PMID: 15647348]
  35. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D731-5 [PMID: 16381968]
  36. Biochem J. 1990 Apr 1;267(1):1-12 [PMID: 2183790]
  37. BMC Plant Biol. 2004 Jun 01;4:10 [PMID: 15171794]
  38. Genome Res. 2004 Jun;14(6):1095-106 [PMID: 15173115]
  39. BMC Genomics. 2003 Jul 29;4(1):31 [PMID: 12885301]
  40. Plant Cell. 2004 Jul;16(7):1679-91 [PMID: 15208398]
  41. Plant Physiol. 1993 Mar;101(3):1115-6 [PMID: 8310050]
  42. J Biochem. 1983 Aug;94(2):589-99 [PMID: 6630176]
  43. Genetics. 2001 Jan;157(1):349-60 [PMID: 11139515]
  44. Genome Res. 2001 Nov;11(11):1817-25 [PMID: 11691845]
  45. J Biochem. 1986 Oct;100(4):975-83 [PMID: 3818572]
  46. Genome. 2004 Jun;47(3):610-4 [PMID: 15190378]
  47. BMC Biol. 2005 Sep 27;3:20 [PMID: 16188032]
  48. Comput Appl Biosci. 1997 Oct;13(5):555-6 [PMID: 9367129]
  49. Plant Physiol. 2003 Feb;131(2):610-20 [PMID: 12586885]
  50. Genome Res. 2003 Feb;13(2):137-44 [PMID: 12566392]
  51. BMC Genomics. 2006 Aug 09;7:200 [PMID: 16895613]
  52. Biol Chem. 1997 Mar-Apr;378(3-4):273-81 [PMID: 9165081]
  53. Radiat Res. 2006 Aug;166(2):327-32 [PMID: 16881733]
  54. Plant Mol Biol. 1990 Jul;15(1):191-5 [PMID: 2103437]
  55. Plant Physiol. 2005 May;138(1):47-54 [PMID: 15888677]
  56. Gene. 2006 Aug 15;378:84-94 [PMID: 16831523]
  57. Curr Opin Plant Biol. 2006 Apr;9(2):157-63 [PMID: 16459130]
  58. Trends Genet. 2002 Dec;18(12):609-13 [PMID: 12446139]
  59. Genetics. 2004 May;167(1):531-42 [PMID: 15166175]
  60. Proc Natl Acad Sci U S A. 2004 Jun 29;101(26):9903-8 [PMID: 15161969]
  61. Proc Biol Sci. 1994 May 23;256(1346):119-24 [PMID: 8029240]
  62. Proc Soc Exp Biol Med. 1946 Dec;63(3):547-50 [PMID: 20281108]
  63. Genome Biol. 2006;7(5):R41 [PMID: 16719932]
  64. BMC Evol Biol. 2005 Dec 20;5:72 [PMID: 16368012]
  65. Plant Physiol. 2004 Oct;136(2):3009-22 [PMID: 15489284]
  66. Proc Natl Acad Sci U S A. 1996 Sep 17;93(19):10274-9 [PMID: 8816790]
  67. Nature. 2005 Aug 11;436(7052):793-800 [PMID: 16100779]

MeSH Term

Arabidopsis
Expressed Sequence Tags
Gene Duplication
Genes, Plant
Multigene Family
Oryza
Phylogeny
Plant Proteins
Protein Isoforms

Chemicals

Plant Proteins
Protein Isoforms

Word Cloud

Created with Highcharts 10.0.0riceparalogousgenesproteinArabidopsisfamiliesgenegenomefamilyduplicationproteinscomparedfunctionexpressionmajoreventsdiploidspeciesMbundergonesubstantialportionresultedUsingsubstantiallyrespectivelyknownputativesingletondistributionsignificantdifferentialduplicateddatashowcorrelationdivergenceBACKGROUND:HighnumbersplantgenomesreflectpolyploidyOryzasativacultivatedmonocotyledonous~390segmentalcoupledgenetictandemduplicationsnumberresultingoccurringRESULTS:computationalpipelineutilizesPfamnoveldomainscharacterizedmodeldicotyledonousthalianawellsmaller~120complementOverall53%68%non-transposableelement-relatedclassifiedSingletondifferedlikelihoodencoding26%66%73%96%encodeFurthermoreskewspecificobservedtotal17GeneOntologycategoriesstatisticallycontrastmammalianorganismsfoundtendalternativespliceformsMassivelyParallelSignatureSequencingdivergentalthoughsequenceseenyoungCONCLUSION:Collectivelysuggestco-regulationconservedpresentmembersevolutionarypressuresfunctionalpatternsCharacterization

Similar Articles

Cited By