High-performance single-cell gene regulatory network inference at scale: the Inferelator 3.0.

Claudia Skok Gibbs, Christopher A Jackson, Giuseppe-Antonio Saldi, Andreas Tjärnberg, Aashna Shah, Aaron Watters, Nicholas De Veaux, Konstantine Tchourine, Ren Yi, Tymor Hamamsy, Dayanne M Castro, Nicholas Carriero, Bram L Gorissen, David Gresham, Emily R Miraldi, Richard Bonneau
Author Information
  1. Claudia Skok Gibbs: Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA.
  2. Christopher A Jackson: Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA. ORCID
  3. Giuseppe-Antonio Saldi: Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA.
  4. Andreas Tjärnberg: Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA. ORCID
  5. Aashna Shah: Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA.
  6. Aaron Watters: Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA.
  7. Nicholas De Veaux: Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA.
  8. Konstantine Tchourine: Department of Systems Biology, Columbia University, New York, NY 10027, USA.
  9. Ren Yi: Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA. ORCID
  10. Tymor Hamamsy: Center for Data Science, New York University, New York, NY 10003, USA.
  11. Dayanne M Castro: Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA.
  12. Nicholas Carriero: Flatiron Institute, Scientific Computing Core, Simons Foundation, New York, NY 10010, USA.
  13. Bram L Gorissen: Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
  14. David Gresham: Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA.
  15. Emily R Miraldi: Divisions of Immunobiology and Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA.
  16. Richard Bonneau: Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA.

Abstract

MOTIVATION: Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above.
RESULTS: In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data.
AVAILABILITY AND IMPLEMENTATION: The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/).
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

References

  1. Cell. 2012 Oct 12;151(2):289-303 [PMID: 23021777]
  2. Cell. 2015 May 21;161(5):1202-1214 [PMID: 26000488]
  3. Elife. 2020 May 18;9: [PMID: 32420869]
  4. Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:5448-51 [PMID: 19964678]
  5. Nucleic Acids Res. 2015 Apr 20;43(7):e47 [PMID: 25605792]
  6. Bioinformatics. 2013 Apr 15;29(8):1060-7 [PMID: 23525069]
  7. Nature. 2020 Jul;583(7818):699-710 [PMID: 32728249]
  8. Brief Bioinform. 2021 May 20;22(3): [PMID: 34020546]
  9. Mol Syst Biol. 2020 Mar;16(3):e9174 [PMID: 32181581]
  10. Annu Rev Cell Dev Biol. 2015;31:399-428 [PMID: 26355593]
  11. Genome Res. 2019 Mar;29(3):449-463 [PMID: 30696696]
  12. Adv Neural Inf Process Syst. 2010 Dec 31;24(2):1432-1440 [PMID: 25152607]
  13. Nat Methods. 2020 Feb;17(2):147-154 [PMID: 31907445]
  14. BMC Syst Biol. 2011 Oct 28;5:177 [PMID: 22034917]
  15. Mol Syst Biol. 2015 Nov 17;11(11):839 [PMID: 26577401]
  16. Nat Commun. 2017 Oct 25;8(1):1131 [PMID: 29070809]
  17. Genome Biol. 2021 Oct 29;22(1):301 [PMID: 34715899]
  18. BMC Bioinformatics. 2018 Jun 19;19(1):232 [PMID: 29914350]
  19. Cell. 2016 Dec 15;167(7):1853-1866.e17 [PMID: 27984732]
  20. Elife. 2020 Jan 27;9: [PMID: 31985403]
  21. Science. 2018 Apr 13;360(6385):176-182 [PMID: 29545511]
  22. Genetics. 2011 Nov;189(3):705-36 [PMID: 22084422]
  23. Nat Commun. 2019 Aug 29;10(1):3892 [PMID: 31467272]
  24. Genome Biol. 2018 Feb 6;19(1):15 [PMID: 29409532]
  25. Immunity. 2019 Jul 16;51(1):185-197.e6 [PMID: 31278058]
  26. Nat Protoc. 2017 Jan;12(1):44-73 [PMID: 27929523]
  27. Plant Cell. 2016 Oct;28(10):2365-2384 [PMID: 27655842]
  28. Nat Biotechnol. 2022 Jun;40(6):896-905 [PMID: 35058622]
  29. Science. 2008 Jun 6;320(5881):1344-9 [PMID: 18451266]
  30. Bioinformatics. 2021 Jun 9;37(9):1234-1245 [PMID: 33135076]
  31. Cell. 2004 Jan 23;116(2):247-57 [PMID: 14744435]
  32. Bioinformatics. 2016 Mar 15;32(6):875-83 [PMID: 26568633]
  33. Nucleic Acids Res. 2018 Jan 4;46(D1):D348-D353 [PMID: 29036684]
  34. PLoS One. 2010 Oct 25;5(10):e13397 [PMID: 21049040]
  35. PLoS Comput Biol. 2019 Jan 24;15(1):e1006591 [PMID: 30677040]
  36. Nature. 2023 Feb;614(7949):742-751 [PMID: 36755098]
  37. Annu Rev Genet. 2020 Nov 23;54:367-385 [PMID: 32886547]
  38. Cell Rep. 2018 Apr 10;23(2):376-388 [PMID: 29641998]
  39. Proc Natl Acad Sci U S A. 2008 Feb 26;105(8):2901-6 [PMID: 18287046]
  40. Nat Methods. 2020 Jun;17(6):629-635 [PMID: 32483332]
  41. PLoS Comput Biol. 2021 Jan 7;17(1):e1008569 [PMID: 33411784]
  42. Nat Protoc. 2020 Jul;15(7):2247-2276 [PMID: 32561888]
  43. Nat Rev Genet. 2016 Oct;17(10):615-29 [PMID: 27498692]
  44. Genome Biol. 2021 Jan 8;22(1):25 [PMID: 33419455]
  45. Cell. 2019 Jun 13;177(7):1888-1902.e21 [PMID: 31178118]
  46. Nat Biotechnol. 2020 Feb;38(2):147-150 [PMID: 31937974]
  47. Science. 2012 Mar 2;335(6072):1103-6 [PMID: 22383849]
  48. Nature. 2021 Sep;597(7878):693-697 [PMID: 34552240]
  49. Genome Biol. 2006;7(5):R36 [PMID: 16686963]
  50. Nucleic Acids Res. 2020 Jan 8;48(D1):D642-D649 [PMID: 31586406]
  51. Curr Opin Biotechnol. 2016 Jun;39:157-166 [PMID: 27115495]
  52. Cell. 2011 Mar 18;144(6):970-85 [PMID: 21414487]
  53. Nat Commun. 2017 Jan 16;8:14049 [PMID: 28091601]
  54. Genome Biol. 2019 Oct 18;20(1):211 [PMID: 31627739]
  55. Science. 1997 Oct 24;278(5338):680-6 [PMID: 9381177]
  56. Cell. 2018 Feb 8;172(4):650-665 [PMID: 29425488]

Grants

  1. R01 NS116350/NINDS NIH HHS
  2. U54 AG076040/NIA NIH HHS
  3. R01 AI153442/NIAID NIH HHS
  4. R01 AI130945/NIAID NIH HHS
  5. R01 GM134066/NIGMS NIH HHS
  6. R01 NS118183/NINDS NIH HHS
  7. U01 AI150748/NIAID NIH HHS
  8. RM1 HG011014/NHGRI NIH HHS
  9. R35GM122515/NIH HHS
  10. R01 HD096770/NICHD NIH HHS
  11. R01 AI140766/NIAID NIH HHS
  12. R35 GM122515/NIGMS NIH HHS

MeSH Term

Animals
Mice
Gene Regulatory Networks
Software
Genomics
Genome
Chromatin

Chemicals

Chromatin

Word Cloud

Created with Highcharts 10.0.0regulatorynetworksdatagenesingle-cellInferelatorexpression3networkreconstructinglearninglargestdatasetsmillionscellsnew0integratedistinctcelltypeslearninferenceavailableMOTIVATION:GenedefinerelationshipstranscriptionfactorstargetgeneswithinbiologicalsystemessentialunderstandingcellulargrowthfunctionMethodsinferringgenomicsevolvedrapidlylastdecaderesponseadvancessequencingtechnologymachinescalecollectionincreaseddramaticallygenome-widegrownthousandsmeasurementssingletechnologieshorizonincreasetensaboveRESULTS:workpresentsignificantlyupdatedcontext-specificaggregatesharedretainingfunctionalitypreviousversionsablecell-type-specificComparedmethodslearnsinformativeSaccharomycescerevisiaemeasuredrecoveryknowngoldstandarddemonstratescalingcapabilitiesmultipleneuronalglialdevelopingMusmusculusbrainE18large1milliondatasetpairedchromatinaccessibilityAVAILABILITYANDIMPLEMENTATION:inferelatorsoftwareGitHubhttps://githubcom/flatironinstitute/inferelatorMITlicensereleasedpythonpackagesassociateddocumentationhttps://inferelatorreadthedocsio/SUPPLEMENTARYINFORMATION:SupplementaryBioinformaticsonlineHigh-performancescale:

Similar Articles

Cited By