PSATF-6mA: an integrated learning fusion feature-encoded DNA-6 mA methylcytosine modification site recognition model based on attentional mechanisms.

Yanmei Kang, Hongyuan Wang, Yubo Qin, Guanlin Liu, Yi Yu, Yongjian Zhang
Author Information
  1. Yanmei Kang: School of Cyber Science and Engineering, University of International Relations, Beijing, China.
  2. Hongyuan Wang: School of Cyber Science and Engineering, University of International Relations, Beijing, China.
  3. Yubo Qin: School of Cyber Science and Engineering, University of International Relations, Beijing, China.
  4. Guanlin Liu: School of Cyber Science and Engineering, University of International Relations, Beijing, China.
  5. Yi Yu: College of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China.
  6. Yongjian Zhang: School of Cyber Science and Engineering, University of International Relations, Beijing, China.

Abstract

DNA methylation is of crucial importance for biological genetic expression, such as biological cell differentiation and cellular tumours. The identification of DNA-6mA sites using traditional biological experimental methods requires more cumbersome steps and a large amount of time. The advent of neural network technology has facilitated the identification of 6 mA sites on cross-species DNA with enhanced efficacy. Nevertheless, the majority of contemporary neural network models for identifying 6 mA sites prioritize the design of the identification model, with comparatively limited research conducted on the statistically significant DNA sequence itself. Consequently, this paper will focus on the statistical strategy of DNA double-stranded features, utilising the multi-head self-attention mechanism in neural networks applied to DNA position probabilistic relationships. Furthermore, a new recognition model, PSATF-6 mA, will be constructed by continually adjusting the attentional tendency of feature fusion through an integrated learning framework. The experimental results, obtained through cross-validation with cross-species data, demonstrate that the PSATF-6 mA model outperforms the baseline model. The in-Matthews correlation coefficient (MCC) for the cross-species dataset of rice and m. musus genomes can reach a score of 0.982. The present model is expected to assist biologists in more accurately identifying 6 mA locus and in formulating new testable biological hypotheses.

Keywords

References

  1. Bioinformatics. 2012 Dec 1;28(23):3150-2 [PMID: 23060610]
  2. Anal Biochem. 2016 Mar 15;497:60-7 [PMID: 26748145]
  3. Front Genet. 2019 Oct 11;10:1071 [PMID: 31681441]
  4. Genomics. 2019 Jan;111(1):96-102 [PMID: 29360500]
  5. Bioinformatics. 2019 Aug 15;35(16):2796-2800 [PMID: 30624619]
  6. BMC Genomics. 2019 May 13;20(1):365 [PMID: 31084602]
  7. Comput Biol Med. 2021 Jul;134:104516 [PMID: 34119922]
  8. BMC Bioinformatics. 2023 Jan 18;24(1):21 [PMID: 36653789]
  9. Mol Cancer. 2019 Dec 4;18(1):176 [PMID: 31801551]
  10. Brief Bioinform. 2021 Sep 2;22(5): [PMID: 33537726]
  11. Sci Rep. 2019 Sep 11;9(1):13109 [PMID: 31511570]
  12. Methods. 2022 Aug;204:142-150 [PMID: 35477057]
  13. BioData Min. 2023 Nov 27;16(1):34 [PMID: 38012796]
  14. Brief Bioinform. 2020 Sep 25;21(5):1676-1696 [PMID: 31714956]
  15. Methods. 2009 Mar;47(3):142-50 [PMID: 18950712]
  16. Bioinformatics. 2020 Jun 1;36(11):3336-3342 [PMID: 32134472]
  17. Genes (Basel). 2019 Oct 20;10(10): [PMID: 31635172]
  18. PeerJ. 2021 Feb 3;9:e10813 [PMID: 33604189]
  19. Brief Bioinform. 2022 Mar 10;23(2): [PMID: 35225328]
  20. IEEE Trans Neural Netw. 1992;3(5):683-97 [PMID: 18276468]
  21. Bioinformatics. 2018 Jul 15;34(14):2499-2502 [PMID: 29528364]
  22. Electrophoresis. 2010 Oct;31(21):3548-51 [PMID: 20925053]
  23. Genomics. 2019 Dec;111(6):1839-1852 [PMID: 30550813]
  24. RNA Biol. 2021 Nov;18(11):1882-1892 [PMID: 33446014]
  25. Anal Biochem. 2015 Apr 1;474:69-77 [PMID: 25596338]
  26. Nat Commun. 2019 Feb 4;10(1):579 [PMID: 30718479]
  27. Aging (Albany NY). 2020 Apr 13;12(7):6276-6291 [PMID: 32283543]
  28. Int J Mol Sci. 2020 Jan 11;21(2): [PMID: 31940793]
  29. Comput Struct Biotechnol J. 2021 Mar 19;19:1612-1619 [PMID: 33868598]
  30. Cell. 2015 May 7;161(4):868-78 [PMID: 25936839]
  31. Nat Methods. 2010 Jun;7(6):461-5 [PMID: 20453866]
  32. Genomics Proteomics Bioinformatics. 2020 Oct;18(5):582-592 [PMID: 33515750]
  33. Bioinformatics. 2020 Feb 15;36(4):1074-1081 [PMID: 31603468]
  34. Genomics Proteomics Bioinformatics. 2018 Dec;16(6):451-459 [PMID: 30639696]
  35. Comput Biol Med. 2023 Jun;160:107030 [PMID: 37196456]
  36. Sci Rep. 2016 Jan 22;6:19598 [PMID: 26797014]

Word Cloud

Created with Highcharts 10.0.0DNAmodel6 mAbiologicallearningmethylationidentificationsitesneuralcross-speciesintegratedexperimentalnetworkidentifyingwillnewrecognitionPSATF-6 mAattentionalfusionN6-methylcytosinecrucialimportancegeneticexpressioncelldifferentiationcellulartumoursDNA-6mAusingtraditionalmethodsrequirescumbersomestepslargeamounttimeadventtechnologyfacilitatedenhancedefficacyNeverthelessmajoritycontemporarymodelsprioritizedesigncomparativelylimitedresearchconductedstatisticallysignificantsequenceConsequentlypaperfocusstatisticalstrategydouble-strandedfeaturesutilisingmulti-headself-attentionmechanismnetworksappliedpositionprobabilisticrelationshipsFurthermoreconstructedcontinuallyadjustingtendencyfeatureframeworkresultsobtainedcross-validationdatademonstrateoutperformsbaselinein-MatthewscorrelationcoefficientMCCdatasetricemmususgenomescanreachscore0982presentexpectedassistbiologistsaccuratelylocusformulatingtestablehypothesesPSATF-6mA:feature-encodedDNA-6 mAmethylcytosinemodificationsitebasedmechanismscross-speciestransfer

Similar Articles

Cited By

No available data.