Seeded Bayesian Networks: constructing genetic networks from microarray data.

Amira Djebbari, John Quackenbush
Author Information
  1. Amira Djebbari: Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA. amirad@gmail.com

Abstract

BACKGROUND: DNA microarrays and other genomics-inspired technologies provide large datasets that often include hidden patterns of correlation between genes reflecting the complex processes that underlie cellular metabolism and physiology. The challenge in analyzing large-scale expression data has been to extract biologically meaningful inferences regarding these processes - often represented as networks - in an environment where the datasets are often imperfect and biological noise can obscure the actual signal. Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited. Here we describe a method that draws on prior information about gene-gene interactions to infer biologically relevant pathways from microarray data. Our approach consists of using preliminary networks derived from the literature and/or protein-protein interaction data as seeds for a Bayesian network analysis of microarray results.
RESULTS: Through a bootstrap analysis of gene expression data derived from a number of leukemia studies, we demonstrate that seeded Bayesian Networks have the ability to identify high-confidence gene-gene interactions which can then be validated by comparison to other sources of pathway data.
CONCLUSION: The use of network seeds greatly improves the ability of Bayesian Network analysis to learn gene interaction networks from gene expression data. We demonstrate that the use of seeds derived from the biomedical literature or high-throughput protein-protein interaction data, or the combination, provides improvement over a standard Bayesian Network analysis, allowing networks involving dynamic processes to be deduced from the static snapshots of biological systems that represent the most common source of microarray data. Software implementing these methods has been included in the widely used TM4 microarray analysis package.

References

  1. Stat Appl Genet Mol Biol. 2007;6:Article15 [PMID: 17542777]
  2. Hum Mol Genet. 2001 Apr;10(7):699-703 [PMID: 11257102]
  3. Artif Intell Med. 2004 Mar;30(3):215-32 [PMID: 15081073]
  4. In Silico Biol. 2004;4(3):335-53 [PMID: 15724284]
  5. Genome Res. 2003 Nov;13(11):2498-504 [PMID: 14597658]
  6. Bioinformatics. 2004 Oct 12;20(15):2479-81 [PMID: 15073010]
  7. Blood. 2004 Dec 1;104(12):3679-87 [PMID: 15226186]
  8. Pac Symp Biocomput. 1999;:17-28 [PMID: 10380182]
  9. Novartis Found Symp. 2002;247:91-101; discussion 101-3, 119-28, 244-52 [PMID: 12539951]
  10. Blood. 2003 Oct 15;102(8):2951-9 [PMID: 12730115]
  11. J Comput Biol. 2000;7(3-4):601-20 [PMID: 11108481]
  12. Genes Dev. 2000 Oct 1;14(19):2393-409 [PMID: 11018009]
  13. Biotechniques. 2003 Feb;34(2):374-8 [PMID: 12613259]
  14. Pac Symp Biocomput. 2002;:437-49 [PMID: 11928497]
  15. Proc IEEE Comput Soc Bioinform Conf. 2003;2:104-13 [PMID: 16452784]
  16. Methods Enzymol. 2006;411:134-93 [PMID: 16939790]
  17. Mol Biol Cell. 1998 Dec;9(12):3273-97 [PMID: 9843569]
  18. Pac Symp Biocomput. 1999;:112-23 [PMID: 10380190]
  19. Science. 1999 Oct 15;286(5439):531-7 [PMID: 10521349]
  20. Hematology. 2006 Feb;11(1):31-4 [PMID: 16522546]
  21. Comput Syst Bioinformatics Conf. 2007;6:85-95 [PMID: 17951815]
  22. Nature. 2005 Oct 20;437(7062):1173-8 [PMID: 16189514]
  23. Nat Genet. 2001 May;28(1):21-8 [PMID: 11326270]
  24. Ann N Y Acad Sci. 2007 Dec;1115:240-8 [PMID: 17925352]
  25. Pac Symp Biocomput. 1999;:29-40 [PMID: 10380183]
  26. CMAJ. 2001 May 1;164(9):1317-9 [PMID: 11341144]

MeSH Term

Bayes Theorem
False Positive Reactions
Gene Expression Regulation, Neoplastic
Gene Regulatory Networks
Genomics
Humans
Leukemia
Oligonucleotide Array Sequence Analysis
Reproducibility of Results

Word Cloud

Created with Highcharts 10.0.0datanetworksmicroarrayBayesiananalysisoftenprocessesexpressionabilitynetworkderivedinteractionseedsgenedatasetsextractbiologicallymeaningful-biologicalcangene-geneinteractionsliteratureprotein-proteindemonstrateuseNetworkBACKGROUND:DNAmicroarraysgenomics-inspiredtechnologiesprovidelargeincludehiddenpatternscorrelationgenesreflectingcomplexunderliecellularmetabolismphysiologychallengeanalyzinglarge-scaleinferencesregardingrepresentedenvironmentimperfectnoiseobscureactualsignalAlthoughmanytechniquesdevelopedattemptaddressissuesdatepredictiverelationshipslimiteddescribemethoddrawspriorinformationinferrelevantpathwaysapproachconsistsusingpreliminaryand/orresultsRESULTS:bootstrapnumberleukemiastudiesseededNetworksidentifyhigh-confidencevalidatedcomparisonsourcespathwayCONCLUSION:greatlyimproveslearnbiomedicalhigh-throughputcombinationprovidesimprovementstandardallowinginvolvingdynamicdeducedstaticsnapshotssystemsrepresentcommonsourceSoftwareimplementingmethodsincludedwidelyusedTM4packageSeededNetworks:constructinggenetic

Similar Articles

Cited By