Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks.

Catharina E Graafland, José M Gutiérrez
Author Information
  1. Catharina E Graafland: Instituto de Física de Cantabria, CSIC-Universidad de Cantabria, Avenida de Los Castros, 39005, Santander, Spain. catharina.graafland@unican.es.
  2. José M Gutiérrez: Instituto de Física de Cantabria, CSIC-Universidad de Cantabria, Avenida de Los Castros, 39005, Santander, Spain.

Abstract

Reconstruction of Gene Regulatory Networks (GRNs) of gene expression data with Probabilistic Network Models (PNMs) is an open problem. Gene expression datasets consist of thousand of genes with relatively small sample sizes (i.e. are large-p-small-n). Moreover, dependencies of various orders coexist in the datasets. On the one hand transcription factor encoding genes act like hubs and regulate target genes, on the other hand target genes show local dependencies. In the field of Undirected Network Models (UNMs)-a subclass of PNMs-the Glasso algorithm has been proposed to deal with high dimensional microarray datasets forcing sparsity. To overcome the problem of the complex structure of interactions, modifications of the default Glasso algorithm have been developed that integrate the expected dependency structure in the UNMs beforehand. In this work we advocate the use of a simple score-based Hill Climbing algorithm (HC) that learns Gaussian Bayesian networks leaning on directed acyclic graphs. We compare HC with Glasso and variants in the UNM framework based on their capability to reconstruct GRNs from microarray data from the benchmarking synthetic dataset from the DREAM5 challenge and from real-world data from the Escherichia coli genome. We conclude that dependencies in complex data are learned best by the HC algorithm, presenting them most accurately and efficiently, simultaneously modelling strong local and weaker but significant global connections coexisting in the gene expression dataset. The HC algorithm adapts intrinsically to the complex dependency structure of the dataset, without forcing a specific structure in advance.

References

  1. Artif Intell Med. 2019 Apr;95:133-145 [PMID: 30420244]
  2. J Comput Biol. 2002;9(1):67-103 [PMID: 11911796]
  3. Psychol Methods. 2018 Dec;23(4):617-634 [PMID: 29595293]
  4. Nucleic Acids Res. 2008 Jan;36(Database issue):D866-70 [PMID: 17932051]
  5. J Comput Biol. 2000;7(3-4):601-20 [PMID: 11108481]
  6. BMC Bioinformatics. 2017 Mar 23;18(1):186 [PMID: 28335719]
  7. Biostatistics. 2008 Jul;9(3):432-41 [PMID: 18079126]
  8. BMC Bioinformatics. 2021 Feb 9;22(1):58 [PMID: 33563211]
  9. BMC Genomics. 2017 Nov 17;18(Suppl 9):844 [PMID: 29219084]
  10. Nucleic Acids Res. 2019 Jan 8;47(D1):D212-D220 [PMID: 30395280]
  11. Postdoc J. 2013 Jan;1(1):60-69 [PMID: 27595119]
  12. G3 (Bethesda). 2015 Mar 30;5(6):1075-9 [PMID: 25823587]
  13. Bioinformatics. 2011 Apr 1;27(7):994-1000 [PMID: 21317141]
  14. Bioinform Biol Insights. 2019 Apr 08;13:1177932219839402 [PMID: 31007526]
  15. J Am Stat Assoc. 2009 Jun 1;104(486):735-746 [PMID: 19881892]
  16. Pac Symp Biocomput. 1999;:29-40 [PMID: 10380183]
  17. Bioinformatics. 2006 Oct 15;22(20):2523-31 [PMID: 16844710]
  18. Chaos. 2014 Jun;24(2):023103 [PMID: 24985417]
  19. Sci Rep. 2020 Jul 13;10(1):11484 [PMID: 32661248]
  20. Bioinformatics. 2005 Jan 1;21(1):71-9 [PMID: 15308537]
  21. Pac Symp Biocomput. 2001;:422-33 [PMID: 11262961]
  22. Science. 1999 Oct 15;286(5439):509-12 [PMID: 10521342]
  23. Curr Opin Genet Dev. 2016 Apr;37:101-108 [PMID: 26950762]
  24. Bioinformatics. 2001;17 Suppl 1:S215-24 [PMID: 11473012]
  25. Stat Appl Genet Mol Biol. 2005;4:Article17 [PMID: 16646834]
  26. Nat Methods. 2012 Jul 15;9(8):796-804 [PMID: 22796662]
  27. J Mach Learn Res. 2014 Oct;15:3297-3331 [PMID: 25620891]
  28. J Comput Biol. 2009 Feb;16(2):229-39 [PMID: 19183003]

MeSH Term

Gene Regulatory Networks
Bayes Theorem
Algorithms
Oligonucleotide Array Sequence Analysis
Normal Distribution
Escherichia coli
Computational Biology

Word Cloud

Created with Highcharts 10.0.0dataalgorithmstructuregenescomplexHCgeneexpressiondatasetsdependenciesGlassomicroarraydependencynetworksdatasetGeneGRNsNetworkModelsproblemhandtargetlocalUNMshighdimensionalforcingGaussianBayesianReconstructionRegulatoryNetworksProbabilisticPNMsopenconsistthousandrelativelysmallsamplesizesielarge-p-small-nMoreovervariousorderscoexistonetranscriptionfactorencodingactlikehubsregulateshowfieldUndirected-asubclassPNMs-theproposeddealsparsityovercomeinteractionsmodificationsdefaultdevelopedintegrateexpectedbeforehandworkadvocateusesimplescore-basedHillClimbinglearnsleaningdirectedacyclicgraphscomparevariantsUNMframeworkbasedcapabilityreconstructbenchmarkingsyntheticDREAM5challengereal-worldEscherichiacoligenomeconcludelearnedbestpresentingaccuratelyefficientlysimultaneouslymodellingstrongweakersignificantglobalconnectionscoexistingadaptsintrinsicallywithoutspecificadvanceLearningregulatory

Similar Articles

Cited By