Deep5mC: Predicting 5-methylcytosine (5mC) methylation status using a deep learning transformer approach.

Evan Kinnear, Houssemeddine Derbel, Zhongming Zhao, Qian Liu
Author Information
  1. Evan Kinnear: Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA.
  2. Houssemeddine Derbel: Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA.
  3. Zhongming Zhao: Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
  4. Qian Liu: Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA.

Abstract

DNA methylations, such as 5-methylcytosine (5mC), are crucial in biological processes, and aberrant methylations are strongly linked to various human diseases. Genomic 5mC is not randomly distributed but exhibits a strong association with genomic sequences. Thus, various computational methods were developed to predict 5mC status based on DNA sequences. These methods generated promising achievements and overcome the limitations of experimental approaches. However, few studies have comprehensively investigated the dependency of 5mC on genomic sequences, and most existing methods focus on specific genomic regions. In this work, we introduce Deep5mC, a deep learning transformer-based method designed to predict 5mC methylations. Deep5mC leverages long-range dependencies within genomic sequences to estimate the probability of cytosine methylations. Through cross-chromosome evaluation, Deep5mC achieves Matthew's correlation coefficient over 0.86 and F1-score over 0.93, substantially outperforming state-of-the-art methods. Deep5mC not only confirms the influence of long-range sequence context on 5mC prediction but also paves the way for further studying 5mC-sequence dependency across species and in human diseases.

Keywords

References

  1. Front Endocrinol (Lausanne). 2023 Jan 16;13:1059120 [PMID: 36726473]
  2. Cold Spring Harb Perspect Biol. 2011 Jul 01;3(7): [PMID: 21576252]
  3. Genome Res. 2011 Oct;21(10):1592-600 [PMID: 21862626]
  4. Anal Biochem. 2015 Apr 1;474:69-77 [PMID: 25596338]
  5. Nature. 2015 Feb 19;518(7539):317-30 [PMID: 25693563]
  6. Genome Biol. 2017 Apr 11;18(1):67 [PMID: 28395661]
  7. Nucleic Acids Res. 2019 Jan 8;47(D1):D853-D858 [PMID: 30407534]
  8. Nat Protoc. 2013 Oct;8(10):1841-51 [PMID: 24008380]
  9. Genes Chromosomes Cancer. 2015 Feb;54(2):110-21 [PMID: 25407423]
  10. Anal Chim Acta. 2014 Dec 10;852:212-7 [PMID: 25441900]
  11. Biosensors (Basel). 2021 Jun 30;11(7): [PMID: 34208844]
  12. Molecules. 2021 Dec 07;26(24): [PMID: 34946497]
  13. Aging Cell. 2020 Feb;19(2):e12907 [PMID: 30793472]
  14. Nature. 2012 Mar 28;483(7391):603-7 [PMID: 22460905]
  15. Trends Genet. 2021 Nov;37(11):1012-1027 [PMID: 34120771]
  16. Front Cell Dev Biol. 2020 Jul 28;8:614 [PMID: 32850787]
  17. Science. 2001 Aug 10;293(5532):1068-70 [PMID: 11498573]
  18. Nucleic Acids Res. 2018 Mar 16;46(5):2159-2168 [PMID: 29401301]
  19. Genome Biol. 2022 Oct 17;23(1):219 [PMID: 36253864]
  20. Brief Bioinform. 2022 Mar 10;23(2): [PMID: 35225328]
  21. Epigenetics. 2011 Feb;6(2):134-40 [PMID: 20962593]
  22. Nat Rev Rheumatol. 2020 Sep;16(9):514-524 [PMID: 32759997]
  23. Methods Mol Biol. 2011;791:11-21 [PMID: 21913068]
  24. Bioinformatics. 2006 Sep 15;22(18):2204-9 [PMID: 16837523]
  25. IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):87-94 [PMID: 34014828]
  26. FEBS Lett. 2005 Aug 15;579(20):4302-8 [PMID: 16051225]
  27. Hum Mol Genet. 2000 Oct;9(16):2395-402 [PMID: 11005794]
  28. Cell Res. 2020 Nov;30(11):980-996 [PMID: 32581343]
  29. Genes Dev. 2002 Jan 1;16(1):6-21 [PMID: 11782440]

Grants

  1. P20 GM121325/NIGMS NIH HHS
  2. R01 LM012806/NLM NIH HHS
  3. U01 AG079847/NIA NIH HHS

Word Cloud

Created with Highcharts 10.0.05mCgenomicsequencesmethylationsmethodsDeep5mCDNAlearning5-methylcytosinevarioushumandiseasespredictstatusdependencydeeplong-range0predictionmethylationcrucialbiologicalprocessesaberrantstronglylinkedGenomicrandomlydistributedexhibitsstrongassociationThuscomputationaldevelopedbasedgeneratedpromisingachievementsovercomelimitationsexperimentalapproachesHoweverstudiescomprehensivelyinvestigatedexistingfocusspecificregionsworkintroducetransformer-basedmethoddesignedleveragesdependencieswithinestimateprobabilitycytosinecross-chromosomeevaluationachievesMatthew'scorrelationcoefficient86F1-score93substantiallyoutperformingstate-of-the-artconfirmsinfluencesequencecontextalsopaveswaystudying5mC-sequenceacrossspeciesDeep5mC:PredictingusingtransformerapproachAssociationDeep

Similar Articles

Cited By