Meta Analysis of Gene Expression Data within and Across Species.

Ana C Fierro, Filip Vandenbussche, Kristof Engelen, Yves Van de Peer, Kathleen Marchal
Author Information
  1. Ana C Fierro: Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium.

Abstract

Since the second half of the 1990s, a large number of genome-wide analyses have been described that study gene expression at the transcript level. To this end, two major strategies have been adopted, a first one relying on hybridization techniques such as microarrays, and a second one based on sequencing techniques such as serial analysis of gene expression (SAGE), cDNA-AFLP, and analysis based on expressed sequence tags (ESTs). Despite both types of profiling experiments becoming routine techniques in many research groups, their application remains costly and laborious. As a result, the number of conditions profiled in individual studies is still relatively small and usually varies from only two to few hundreds of samples for the largest experiments. More and more, scientific journals require the deposit of these high throughput experiments in public databases upon publication. Mining the information present in these databases offers molecular biologists the possibility to view their own small-scale analysis in the light of what is already available. However, so far, the richness of the public information remains largely unexploited. Several obstacles such as the correct association between ESTs and microarray probes with the corresponding gene transcript, the incompleteness and inconsistency in the annotation of experimental conditions, and the lack of standardized experimental protocols to generate gene expression data, all impede the successful mining of these data. Here, we review the potential and difficulties of combining publicly available expression data from respectively EST analyses and microarray experiments. With examples from literature, we show how meta-analysis of expression profiling experiments can be used to study expression behavior in a single organism or between organisms, across a wide range of experimental conditions. We also provide an overview of the methods and tools that can aid molecular biologists in exploiting these public data.

References

  1. Bioinformatics. 2006 Jul 15;22(14):1682-9 [PMID: 16705015]
  2. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D613-6 [PMID: 16381943]
  3. BMC Bioinformatics. 2006 Sep 22;7:418 [PMID: 16995941]
  4. Neoplasia. 2007 Feb;9(2):166-80 [PMID: 17356713]
  5. PLoS Biol. 2004 Jan;2(1):E9 [PMID: 14737187]
  6. Bioinformatics. 2003;19 Suppl 1:i84-90 [PMID: 12855442]
  7. Bioinformatics. 2005 Oct 15;21(20):3905-11 [PMID: 16131522]
  8. Nat Genet. 2004 Feb;36(2):197-204 [PMID: 14730301]
  9. J Mol Endocrinol. 2004 Aug;33(1):1-9 [PMID: 15291738]
  10. BMC Bioinformatics. 2007 Mar 07;8:80 [PMID: 17343745]
  11. J Eval Clin Pract. 2001 May;7(2):135-48 [PMID: 11489039]
  12. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W673-6 [PMID: 15980560]
  13. Cancer Res. 2002 Aug 1;62(15):4427-33 [PMID: 12154050]
  14. BMC Bioinformatics. 2007 Nov 26;8:461 [PMID: 18039370]
  15. Brief Bioinform. 2005 Mar;6(1):34-43 [PMID: 15826355]
  16. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D542-7 [PMID: 14681477]
  17. Bioinformatics. 2007 Oct 1;23(19):2573-80 [PMID: 17686800]
  18. BMC Bioinformatics. 2006 Feb 16;7:72 [PMID: 16480524]
  19. Nucleic Acids Res. 2008 Jan;36(Database issue):D866-70 [PMID: 17932051]
  20. Bioinformatics. 2005 Dec 15;21(24):4348-55 [PMID: 16234317]
  21. Nucleic Acids Res. 2003 Oct 1;31(19):5676-84 [PMID: 14500831]
  22. Genome Res. 2008 Jan;18(1):172-7 [PMID: 18032722]
  23. Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5 [PMID: 17130148]
  24. BMC Bioinformatics. 2003 Nov 21;4:59 [PMID: 14633289]
  25. Bioinformatics. 2003 Mar 22;19(5):653-4 [PMID: 12651725]
  26. Nucleic Acids Res. 2001 Nov 1;29(21):E102-2 [PMID: 11691939]
  27. Bioinformatics. 2007 Oct 15;23(20):2692-9 [PMID: 17724061]
  28. BMC Bioinformatics. 2005 May 27;6:128 [PMID: 15921507]
  29. PLoS Genet. 2005 Sep;1(3):e39 [PMID: 16470937]
  30. Nucleic Acids Res. 2007 Jan;35(Database issue):D747-50 [PMID: 17132828]
  31. BMC Bioinformatics. 2006 Jun 21;7:311 [PMID: 16790046]
  32. Genome Biol. 2006;7(5):R37 [PMID: 16677396]
  33. Nat Rev Mol Cell Biol. 2006 Mar;7(3):198-210 [PMID: 16496022]
  34. Bioinformatics. 2006 Jun 1;22(11):1359-66 [PMID: 16527831]
  35. Nat Genet. 2003 Feb;33(2):138-44 [PMID: 12548287]
  36. Bioinformatics. 2007 Jul 1;23(13):i577-86 [PMID: 17646346]
  37. Bioinformatics. 2007 Jul 1;23(13):i222-9 [PMID: 17646300]
  38. Proc Natl Acad Sci U S A. 2007 Apr 3;104(14):5959-64 [PMID: 17389406]
  39. BMC Genomics. 2003 Jul 29;4(1):31 [PMID: 12885301]
  40. Nat Methods. 2005 May;2(5):351-6 [PMID: 15846362]
  41. Proc Natl Acad Sci U S A. 2004 Jun 22;101(25):9309-14 [PMID: 15184677]
  42. Plant Physiol. 2007 May;144(1):32-42 [PMID: 17351049]
  43. Genome Res. 1999 Oct;9(10):950-9 [PMID: 10523523]
  44. Bioinformatics. 2005 Apr 15;21(8):1550-8 [PMID: 15598835]
  45. Nature. 2005 Sep 15;437(7057):376-80 [PMID: 16056220]
  46. Bioinformatics. 2005 Sep 1;21 Suppl 2:ii137-43 [PMID: 16204093]
  47. PLoS Biol. 2007 Jan;5(1):e8 [PMID: 17214507]
  48. Plant Physiol. 2005 Dec;139(4):1870-80 [PMID: 16306141]
  49. Plant Physiol. 2003 Feb;131(2):419-29 [PMID: 12586867]
  50. Nat Methods. 2005 May;2(5):345-50 [PMID: 15846361]
  51. Bioinformatics. 2007 Oct 15;23(20):2716-24 [PMID: 17846039]
  52. Plant Physiol. 2002 Mar;128(3):896-910 [PMID: 11891246]
  53. Nucleic Acids Res. 2007 Jan;35(Database issue):D760-5 [PMID: 17099226]
  54. BMC Genomics. 2007 May 16;8:118 [PMID: 17506875]
  55. Science. 1991 Jun 21;252(5013):1651-6 [PMID: 2047873]
  56. Nucleic Acids Res. 2007 Jan;35(Database issue):D756-9 [PMID: 17090592]
  57. BMC Bioinformatics. 2005 Nov 04;6:265 [PMID: 16271137]
  58. Pharmacogenomics. 2005 Jun;6(4):373-82 [PMID: 16004555]
  59. BMC Bioinformatics. 2005 Mar 17;6:57 [PMID: 15774008]
  60. Nat Genet. 2001 Dec;29(4):365-71 [PMID: 11726920]
  61. Trends Genet. 2006 Feb;22(2):101-9 [PMID: 16380191]
  62. Nucleic Acids Res. 2007 Jan;35(Database issue):D610-7 [PMID: 17148474]
  63. Mol Microbiol. 2006 Dec;62(5):1239-50 [PMID: 17040488]
  64. Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902 [PMID: 12689096]
  65. Genome Biol. 2001;2(11):SOFTWARE0002 [PMID: 16173164]
  66. BMC Genomics. 2006 Apr 21;7:86 [PMID: 16626500]
  67. Plant J. 2004 Oct;40(1):47-59 [PMID: 15361140]
  68. Science. 2003 Oct 10;302(5643):249-55 [PMID: 12934013]
  69. Stat Appl Genet Mol Biol. 2004;3:Article3 [PMID: 16646809]
  70. Plant J. 2003 Mar;33(6):1001-11 [PMID: 12631325]
  71. BMC Bioinformatics. 2008 Jan 03;9:1 [PMID: 18173834]
  72. BMC Bioinformatics. 2005 Jul 15;6 Suppl 2:S12 [PMID: 16026597]
  73. J Comput Biol. 2000;7(6):819-37 [PMID: 11382364]
  74. Nat Biotechnol. 2006 Jul;24(7):832-40 [PMID: 16823376]

Word Cloud

Created with Highcharts 10.0.0expressionexperimentsgenedatatechniquesanalysisconditionspublicexperimentalsecondnumberanalysesstudytranscripttwoonebasedESTsprofilingremainsdatabasesinformationmolecularbiologistsavailablemicroarraycanSincehalf1990slargegenome-widedescribedlevelendmajorstrategiesadoptedfirstrelyinghybridizationmicroarrayssequencingserialSAGEcDNA-AFLPexpressedsequencetagsDespitetypesbecomingroutinemanyresearchgroupsapplicationcostlylaboriousresultprofiledindividualstudiesstillrelativelysmallusuallyvarieshundredssampleslargestscientificjournalsrequiredeposithighthroughputuponpublicationMiningpresentofferspossibilityviewsmall-scalelightalreadyHoweverfarrichnesslargelyunexploitedSeveralobstaclescorrectassociationprobescorrespondingincompletenessinconsistencyannotationlackstandardizedprotocolsgenerateimpedesuccessfulminingreviewpotentialdifficultiescombiningpubliclyrespectivelyESTexamplesliteratureshowmeta-analysisusedbehaviorsingleorganismorganismsacrosswiderangealsoprovideoverviewmethodstoolsaidexploitingMetaAnalysisGeneExpressionDatawithinAcrossSpecies

Similar Articles

Cited By (17)