BERT6mA: prediction of DNA N6-methyladenine site using deep learning-based approaches.

Sho Tsukiyama, Md Mehedi Hasan, Hong-Wen Deng, Hiroyuki Kurata
Author Information
  1. Sho Tsukiyama: Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan. ORCID
  2. Md Mehedi Hasan: Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA. ORCID
  3. Hong-Wen Deng: Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA.
  4. Hiroyuki Kurata: Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan. ORCID

Abstract

N6-methyladenine (6mA) is associated with important roles in DNA replication, DNA repair, transcription, regulation of gene expression. Several experimental methods were used to identify DNA modifications. However, these experimental methods are costly and time-consuming. To detect the 6mA and complement these shortcomings of experimental methods, we proposed a novel, deep leaning approach called BERT6mA. To compare the BERT6mA with other deep learning approaches, we used the benchmark datasets including 11 species. The BERT6mA presented the highest AUCs in eight species in independent tests. Furthermore, BERT6mA showed higher and comparable performance with the state-of-the-art models while the BERT6mA showed poor performances in a few species with a small sample size. To overcome this issue, pretraining and fine-tuning between two species were applied to the BERT6mA. The pretrained and fine-tuned models on specific species presented higher performances than other models even for the species with a small sample size. In addition to the prediction, we analyzed the attention weights generated by BERT6mA to reveal how the BERT6mA model extracts critical features responsible for the 6mA prediction. To facilitate biological sciences, the BERT6mA online web server and its source codes are freely accessible at https://github.com/kuratahiroyuki/BERT6mA.git, respectively.

Keywords

References

  1. Methods Mol Biol. 2021;2198:79-90 [PMID: 32822024]
  2. PeerJ. 2021 Feb 3;9:e10813 [PMID: 33604189]
  3. PLoS Comput Biol. 2021 Feb 18;17(2):e1008767 [PMID: 33600435]
  4. Bioinformatics. 2021 Feb 26;: [PMID: 33638635]
  5. Nat Methods. 2010 Jun;7(6):461-5 [PMID: 20453866]
  6. Hortic Res. 2019 Jun 15;6:78 [PMID: 31240103]
  7. Cell. 1990 Sep 7;62(5):967-79 [PMID: 1697508]
  8. Science. 2009 Jan 2;323(5910):133-8 [PMID: 19023044]
  9. Mol Ther Nucleic Acids. 2019 Dec 6;18:131-141 [PMID: 31542696]
  10. Bioinformatics. 2019 Jun 1;35(12):2009-2016 [PMID: 30418485]
  11. Nat Rev Microbiol. 2006 Mar;4(3):183-92 [PMID: 16489347]
  12. Front Genet. 2019 Oct 11;10:1071 [PMID: 31681441]
  13. Mol Cell. 2018 Jul 19;71(2):306-318.e7 [PMID: 30017583]
  14. Nucleic Acids Res. 2017 Jul 3;45(W1):W534-W538 [PMID: 28460012]
  15. Front Genet. 2019 Sep 10;10:793 [PMID: 31552096]
  16. Genetics. 1983 Aug;104(4):571-82 [PMID: 6225697]
  17. Genomics. 2019 Jan;111(1):96-102 [PMID: 29360500]
  18. Nucleic Acids Res. 2017 Nov 16;45(20):11594-11606 [PMID: 29036602]
  19. BMC Bioinformatics. 2019 Sep 6;20(1):456 [PMID: 31492094]
  20. Bioinformatics. 2021 Oct 02;: [PMID: 34601568]
  21. Plant Mol Biol. 2020 May;103(1-2):225-234 [PMID: 32140819]
  22. Nat Methods. 2013 Dec;10(12):1211-2 [PMID: 24097270]
  23. Curr Opin Chem Biol. 2012 Dec;16(5-6):516-24 [PMID: 23092881]
  24. Cells. 2019 Oct 28;8(11): [PMID: 31661923]
  25. Neural Comput. 1997 Nov 15;9(8):1735-80 [PMID: 9377276]
  26. Nat Commun. 2017 Oct 24;8(1):1122 [PMID: 29066820]
  27. Brief Bioinform. 2021 May 20;22(3): [PMID: 32608476]
  28. J Comput Biol. 2018 Nov;25(11):1266-1277 [PMID: 30113871]
  29. Brief Bioinform. 2021 May 20;22(3): [PMID: 32910169]
  30. Plant J. 2019 Feb;97(4):779-794 [PMID: 30427081]
  31. Bioinformatics. 2020 Jan 15;36(2):388-392 [PMID: 31297537]
  32. Nucleic Acids Res. 2017 Jan 4;45(D1):D85-D89 [PMID: 27924023]
  33. iScience. 2020 Apr 24;23(4):100991 [PMID: 32240948]
  34. Microbiol Mol Biol Rev. 2013 Mar;77(1):53-72 [PMID: 23471617]
  35. Mol Ther Nucleic Acids. 2019 Jun 7;16:733-744 [PMID: 31146255]
  36. Neural Netw. 2005 Jun-Jul;18(5-6):602-10 [PMID: 16112549]
  37. Front Plant Sci. 2020 Jan 31;11:4 [PMID: 32076430]
  38. Proc Conf. 2016 Jun;2016:473-482 [PMID: 27885364]
  39. J Bacteriol. 2005 Oct;187(20):7027-37 [PMID: 16199573]

Grants

  1. R01 AR069055/NIAMS NIH HHS
  2. U19 AG055373/NIA NIH HHS

MeSH Term

DNA
DNA Methylation
Deep Learning
Software

Chemicals

DNA

Word Cloud

Created with Highcharts 10.0.0BERT6mAspecies6mADNApredictionexperimentalmethodsdeepmodelsN6-methyladenineusedapproachespresentedshowedhigherperformancessmallsamplesizeassociatedimportantrolesreplicationrepairtranscriptionregulationgeneexpressionSeveralidentifymodificationsHowevercostlytime-consumingdetectcomplementshortcomingsproposednovelleaningapproachcalledcomparelearningbenchmarkdatasetsincluding11highestAUCseightindependenttestsFurthermorecomparableperformancestate-of-the-artpoorovercomeissuepretrainingfine-tuningtwoappliedpretrainedfine-tunedspecificevenadditionanalyzedattentionweightsgeneratedrevealmodelextractscriticalfeaturesresponsiblefacilitatebiologicalsciencesonlinewebserversourcecodesfreelyaccessiblehttps://githubcom/kuratahiroyuki/BERT6mAgitrespectivelyBERT6mA:siteusinglearning-basedmodificationBERTCNNGRULSTMword2vec

Similar Articles

Cited By (24)