Data Integration for Microarrays: Enhanced Inference for Gene Regulatory Networks.

Alina Sîrbu, Martin Crane, Heather J Ruskin
Author Information
  1. Alina Sîrbu: Department of Computer Science and Engineering, University of Bologna, Via Mura Anteo Zamboni 7, Bologna 40126, Italy. alina.sirbu@unibo.it.
  2. Martin Crane: Center for Scientific Computing and Complex Systems Modelling, School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland. mcrane@computing.dcu.ie.
  3. Heather J Ruskin: Center for Scientific Computing and Complex Systems Modelling, School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland. hruskin@computing.dcu.ie.

Abstract

Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions). Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.

Keywords

References

  1. IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):634-47 [PMID: 17975274]
  2. J Cell Biol. 2010 Nov 1;191(3):479-92 [PMID: 21041443]
  3. Bioinformatics. 2000 Jan;16(1):16-23 [PMID: 10812473]
  4. Front Bioeng Biotechnol. 2014 May 20;2:13 [PMID: 25152886]
  5. Biosystems. 2007 Mar;88(1-2):76-91 [PMID: 16870324]
  6. PLoS Genet. 2006 Feb;2(2):e16 [PMID: 16482229]
  7. Genomics. 2008 Mar;91(3):219-31 [PMID: 18191937]
  8. Nat Methods. 2012 Jul 15;9(8):796-804 [PMID: 22796662]
  9. BMC Genomics. 2009 Apr 16;10:161 [PMID: 19371429]
  10. Front Cell Dev Biol. 2014 Aug 19;2:38 [PMID: 25364745]
  11. Nucleic Acids Res. 2011 Jan;39(Database issue):D118-23 [PMID: 20965965]
  12. Theory Biosci. 2012 Jun;131(2):95-102 [PMID: 21948152]
  13. Nat Methods. 2008 Jul;5(7):621-8 [PMID: 18516045]
  14. BMC Syst Biol. 2009 Sep 21;3:94 [PMID: 19769791]
  15. Development. 2007 Oct;134(19):3473-81 [PMID: 17728343]
  16. PLoS One. 2010 Nov 12;5(11):e13822 [PMID: 21103045]
  17. Bioinformatics. 2009 Dec 15;25(24):3267-74 [PMID: 19825796]
  18. Nucleic Acids Res. 2011 Jan;39(Database issue):D1005-10 [PMID: 21097893]
  19. Brief Bioinform. 2010 Jan;11(1):15-29 [PMID: 20061351]
  20. Yearb Med Inform. 2006;:91-103 [PMID: 17051302]
  21. BMC Bioinformatics. 2008 Oct 29;9:461 [PMID: 18959772]
  22. Science. 2009 Feb 27;323(5918):1218-22 [PMID: 19164706]
  23. Nucleic Acids Res. 2003 Jan 1;31(1):172-5 [PMID: 12519974]
  24. J Bioinform Comput Biol. 2006 Apr;4(2):503-14 [PMID: 16819798]
  25. Bioinformatics. 2005 Apr 15;21(8):1747-9 [PMID: 15572468]
  26. PLoS Comput Biol. 2014 Jun 12;10(6):e1003666 [PMID: 24921649]
  27. PLoS One. 2012;7(3):e33624 [PMID: 22479422]
  28. Science. 2010 Dec 24;330(6012):1787-97 [PMID: 21177974]
  29. Proc Natl Acad Sci U S A. 2008 Jan 22;105(3):918-23 [PMID: 18198273]
  30. BMC Syst Biol. 2014 Mar 26;8:37 [PMID: 24669835]
  31. Biosystems. 2005 Aug;81(2):125-36 [PMID: 15951103]
  32. BMC Bioinformatics. 2007 Sep 27;8 Suppl 6:S9 [PMID: 17903290]
  33. Bioinformatics. 2008 May 1;24(9):1154-60 [PMID: 18325927]
  34. Nucleic Acids Res. 2011 Jan;39(Database issue):D736-43 [PMID: 21036869]
  35. BMC Bioinformatics. 2010 Jan 27;11:59 [PMID: 20105328]
  36. BMC Bioinformatics. 2007 Sep 27;8 Suppl 6:S5 [PMID: 17903286]
  37. J Biotechnol. 2009 Nov;144(3):190-203 [PMID: 19631244]
  38. Front Genet. 2012 Feb 03;3:8 [PMID: 22408642]
  39. Genome Biol. 2002;3(12):RESEARCH0088 [PMID: 12537577]
  40. IEEE/ACM Trans Comput Biol Bioinform. 2005 Jul-Sep;2(3):231-42 [PMID: 17044186]
  41. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  42. Brief Funct Genomic Proteomic. 2009 May;8(3):174-83 [PMID: 19535508]

Word Cloud

Created with Highcharts 10.0.0datageneintegrationtechnologiesimportantexpressiongeneratedlargepublicsequencingcantimedatasetsshowtranscriptionalregulatorynetworksindicatingMicroarraybasisnumerousfindingsregardinglastdecadesStudiesamountsdescribingvariousprocessesdueexistencedatabaseswidelyavailableanalysisGivenlowercosthighermaturitycomparednewercontinueproducedeventhoughqualitysubjectdebateHowevergivenvolumehelpovercomeissuesrelatedegnoisereducedresolutionprovidingadditionalinsightfeaturesdirectlyaddressedmethodspresenttestcasebasedDrosophilamelanogasterbindingsiteaffinitiesknowninteractionsUsingevolutionarycomputationframeworkenhanceabilityrecoverwelltypesquantitativequalitativenetworkinferenceresultsclearimprovementperformancemultipleintegratedmicroarraywillremainvaluableviableresourcecomeDataIntegrationMicroarrays:EnhancedInferenceGeneRegulatoryNetworksmicroarraysreverseengineeringregulation

Similar Articles

Cited By (1)