iDNA-ITLM: An interpretable and transferable learning model for identifying DNA methylation.

Xia Yu, Cui Yani, Zhichao Wang, Haixia Long, Rao Zeng, Xiling Liu, Bilal Anas, Jia Ren
Author Information
  1. Xia Yu: School of Information and Communication Engineering, Hainan University, Haikou, Hainan, China.
  2. Cui Yani: School of Information and Communication Engineering, Hainan University, Haikou, Hainan, China.
  3. Zhichao Wang: Unit 32033, The People's Liberation Army, Beijing, China.
  4. Haixia Long: Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, Hainan, China.
  5. Rao Zeng: Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, Hainan, China.
  6. Xiling Liu: Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, Hainan, China.
  7. Bilal Anas: Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, Hainan, China.
  8. Jia Ren: School of Information and Communication Engineering, Hainan University, Haikou, Hainan, China. ORCID

Abstract

In this study, from the perspective of image processing, we propose the iDNA-ITLM model, using a novel data enhance strategy by continuously self-replicating a short DNA sequence into a longer DNA sequence and then embedding it into a high-dimensional matrix to enlarge the receptive field, for identifying DNA methylation sites. Our model consistently outperforms the current state-of-the-art sequence-based DNA methylation site recognition methods when evaluated on 17 benchmark datasets that cover multiple species and include three DNA methylation modifications (4mC, 5hmC, and 6mA). The experimental results demonstrate the robustness and superior performance of our model across these datasets. In addition, our model can transfer learning to RNA methylation sequences and produce good results without modifying the hyperparameters in the model. The proposed iDNA-ITLM model can be considered a universal predictor across DNA and RNA methylation species.

References

  1. Brief Bioinform. 2022 Mar 10;23(2): [PMID: 35225328]
  2. Bioinformatics. 2021 Dec 11;37(24):4603-4610 [PMID: 34601568]
  3. IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):87-94 [PMID: 34014828]
  4. Genome Res. 2010 Mar;20(3):332-40 [PMID: 20107151]
  5. Brief Bioinform. 2021 May 20;22(3): [PMID: 32910169]
  6. Bioinformatics. 2017 Oct 01;33(19):2986-2994 [PMID: 28505334]
  7. BMC Bioinformatics. 2022 Jun 29;23(1):258 [PMID: 35768759]
  8. Bioinformatics. 2022 Aug 10;38(16):3885-3891 [PMID: 35771648]
  9. Nat Struct Mol Biol. 2013 Mar;20(3):274-81 [PMID: 23463312]
  10. Front Genet. 2021 Mar 31;12:663572 [PMID: 33868390]
  11. Genome Biol. 2022 Oct 17;23(1):219 [PMID: 36253864]
  12. Front Genet. 2019 Oct 11;10:1071 [PMID: 31681441]
  13. Brief Bioinform. 2021 May 20;22(3): [PMID: 32608476]
  14. Brief Bioinform. 2021 May 20;22(3): [PMID: 32578842]
  15. Brief Bioinform. 2021 Nov 5;22(6): [PMID: 34459479]
  16. Bioinformatics. 2021 Aug 9;37(15):2112-2120 [PMID: 33538820]
  17. Bioinformatics. 2019 Apr 15;35(8):1326-1333 [PMID: 30239627]
  18. Comput Struct Biotechnol J. 2020 Apr 30;18:1084-1091 [PMID: 32435427]
  19. Bioinformatics. 2020 Jan 15;36(2):388-392 [PMID: 31297537]
  20. Methods. 2022 Aug;204:258-262 [PMID: 35093537]
  21. Nucleic Acids Res. 2017 Jan 4;45(D1):D85-D89 [PMID: 27924023]
  22. iScience. 2020 Apr 24;23(4):100991 [PMID: 32240948]
  23. Front Med (Lausanne). 2023 May 04;10:1187430 [PMID: 37215722]
  24. Front Bioeng Biotechnol. 2020 Apr 21;8:274 [PMID: 32373597]
  25. Comput Biol Med. 2023 Jun;160:107030 [PMID: 37196456]
  26. Molecules. 2021 Dec 07;26(24): [PMID: 34946497]

MeSH Term

DNA Methylation
Humans
DNA
Animals
Machine Learning
Algorithms
RNA

Chemicals

DNA
RNA

Word Cloud

Created with Highcharts 10.0.0modelDNAmethylationiDNA-ITLMsequenceidentifyingdatasetsspeciesresultsacrosscanlearningRNAstudyperspectiveimageprocessingproposeusingnoveldataenhancestrategycontinuouslyself-replicatingshortlongerembeddinghigh-dimensionalmatrixenlargereceptivefieldsitesconsistentlyoutperformscurrentstate-of-the-artsequence-basedsiterecognitionmethodsevaluated17benchmarkcovermultipleincludethreemodifications4mC5hmC6mAexperimentaldemonstraterobustnesssuperiorperformanceadditiontransfersequencesproducegoodwithoutmodifyinghyperparametersproposedconsidereduniversalpredictoriDNA-ITLM:interpretabletransferable

Similar Articles

Cited By