Predictive modeling of single-cell DNA methylome data enhances integration with transcriptome data.

Yasin Uzun, Hao Wu, Kai Tan
Author Information
  1. Yasin Uzun: Center for Childhood Cancer Research, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA.
  2. Hao Wu: Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
  3. Kai Tan: Center for Childhood Cancer Research, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA. ORCID

Abstract

Single-cell DNA methylation data has become increasingly abundant and has uncovered many genes with a positive correlation between expression and promoter methylation, challenging the common dogma based on bulk data. However, computational tools for analyzing single-cell methylome data are lagging far behind. A number of tasks, including cell type calling and integration with transcriptome data, requires the construction of a robust gene activity matrix as the prerequisite but challenging task. The advent of multi-omics data enables measurement of both DNA methylation and gene expression for the same single cells. Although such data is rather sparse, they are sufficient to train supervised models that capture the complex relationship between DNA methylation and gene expression and predict gene activities at single-cell level. Here, we present methylome association by predictive linkage to expression (MAPLE), a computational framework that learns the association between DNA methylation and expression using both gene- and cell-dependent statistical features. Using multiple data sets generated with different experimental protocols, we show that using predicted gene activity values significantly improves several analysis tasks, including clustering, cell type identification, and integration with transcriptome data. Application of MAPLE revealed several interesting biological insights into the relationship between methylation and gene expression, including asymmetric importance of methylation signals around transcription start site for predicting gene expression, and increased predictive power of methylation signals in promoters located outside CpG islands and shores. With the rapid accumulation of single-cell epigenomics data, MAPLE provides a general framework for integrating such data with transcriptome data.

References

  1. Nat Genet. 2019 Jun;51(6):1060-1066 [PMID: 31152164]
  2. J Stat Softw. 2010;33(1):1-22 [PMID: 20808728]
  3. Bioinformatics. 2016 Sep 1;32(17):i405-i412 [PMID: 27587656]
  4. Nat Methods. 2019 Dec;16(12):1289-1296 [PMID: 31740819]
  5. Nat Methods. 2019 Oct;16(10):999-1006 [PMID: 31501549]
  6. Cell. 2018 Jul 26;174(3):716-729.e27 [PMID: 29961576]
  7. Cell. 2016 Nov 17;167(5):1369-1384.e19 [PMID: 27863249]
  8. Biochem Biophys Res Commun. 2009 Jun 12;383(4):421-5 [PMID: 19364493]
  9. Nature. 2009 Nov 19;462(7271):315-22 [PMID: 19829295]
  10. Nat Biotechnol. 2018 Jun;36(5):428-431 [PMID: 29644997]
  11. Nat Commun. 2018 Sep 20;9(1):3824 [PMID: 30237449]
  12. Science. 2017 Aug 11;357(6351):600-604 [PMID: 28798132]
  13. Bioinformatics. 2016 Sep 1;32(17):i639-i648 [PMID: 27587684]
  14. Nature. 2017 Apr 6;544(7648):59-64 [PMID: 28289288]
  15. Nat Genet. 2017 Oct;49(10):1428-1436 [PMID: 28869592]
  16. Nat Rev Mol Cell Biol. 2019 Oct;20(10):590-607 [PMID: 31399642]
  17. Genes Dev. 2011 May 15;25(10):1010-22 [PMID: 21576262]
  18. Nat Biotechnol. 2015 Aug;33(8):831-8 [PMID: 26213851]
  19. Nat Methods. 2014 Aug;11(8):817-820 [PMID: 25042786]
  20. Science. 2013 Aug 9;341(6146):1237905 [PMID: 23828890]
  21. Nat Commun. 2018 Dec 17;9(1):5345 [PMID: 30559361]
  22. Proc Natl Acad Sci U S A. 2014 May 27;111(21):E2191-9 [PMID: 24821768]
  23. Genome Biol. 2019 Dec 23;20(1):296 [PMID: 31870423]
  24. Nature. 2015 Jul 23;523(7561):486-90 [PMID: 26083756]
  25. Cell Mol Life Sci. 2003 Aug;60(8):1647-58 [PMID: 14504655]
  26. Cell. 2019 Jun 13;177(7):1873-1887.e17 [PMID: 31178122]
  27. Genome Biol. 2020 Jan 16;21(1):12 [PMID: 31948481]
  28. Genome Biol. 2016 May 05;17:88 [PMID: 27150361]
  29. Nat Genet. 2013 Oct;45(10):1198-206 [PMID: 23995138]
  30. Genome Biol. 2017 Apr 11;18(1):67 [PMID: 28395661]
  31. Neuron. 2015 Jun 17;86(6):1369-84 [PMID: 26087164]
  32. Nat Protoc. 2015 Mar;10(3):475-83 [PMID: 25692984]
  33. Nat Methods. 2016 Mar;13(3):229-232 [PMID: 26752769]
  34. BMC Bioinformatics. 2018 Jun 8;19(1):220 [PMID: 29884114]
  35. Mol Syst Biol. 2016 Jul 29;12(7):878 [PMID: 27474269]
  36. Nat Commun. 2018 Feb 22;9(1):781 [PMID: 29472610]
  37. Blood. 2020 Aug 13;136(7):845-856 [PMID: 32392346]
  38. Cell. 2019 Jun 13;177(7):1888-1902.e21 [PMID: 31178118]
  39. Nature. 2019 Dec;576(7787):487-491 [PMID: 31827285]
  40. Nat Genet. 2007 Apr;39(4):457-66 [PMID: 17334365]
  41. Nature. 2013 Oct 3;502(7469):59-64 [PMID: 24067610]
  42. Bioinformatics. 2011 Jun 1;27(11):1571-2 [PMID: 21493656]
  43. Nature. 2011 Dec 14;480(7378):490-5 [PMID: 22170606]
  44. Nat Rev Genet. 2012 May 29;13(7):484-92 [PMID: 22641018]
  45. Nat Genet. 2019 Oct;51(10):1442-1449 [PMID: 31501517]
  46. Nat Commun. 2019 Sep 25;10(1):4361 [PMID: 31554804]

Grants

  1. U01 CA226187/NCI NIH HHS
  2. U2C CA233285/NCI NIH HHS
  3. U54 HL156090/NHLBI NIH HHS

MeSH Term

CpG Islands
DNA Methylation
Epigenome
Epigenomics
Transcriptome

Word Cloud

Created with Highcharts 10.0.0datamethylationexpressiongeneDNAsingle-celltranscriptomemethylomeincludingintegrationMAPLEchallengingcomputationaltaskscelltypeactivityrelationshipassociationpredictiveframeworkusingseveralsignalsSingle-cellbecomeincreasinglyabundantuncoveredmanygenespositivecorrelationpromotercommondogmabasedbulkHowevertoolsanalyzinglaggingfarbehindnumbercallingrequiresconstructionrobustmatrixprerequisitetaskadventmulti-omicsenablesmeasurementsinglecellsAlthoughrathersparsesufficienttrainsupervisedmodelscapturecomplexpredictactivitieslevelpresentlinkagelearnsgene-cell-dependentstatisticalfeaturesUsingmultiplesetsgenerateddifferentexperimentalprotocolsshowpredictedvaluessignificantlyimprovesanalysisclusteringidentificationApplicationrevealedinterestingbiologicalinsightsasymmetricimportancearoundtranscriptionstartsitepredictingincreasedpowerpromoterslocatedoutsideCpGislandsshoresrapidaccumulationepigenomicsprovidesgeneralintegratingPredictivemodelingenhances

Similar Articles

Cited By