CNN6mA: Interpretable neural network model based on position-specific CNN and cross-interactive network for 6mA site prediction.

Sho Tsukiyama, Md Mehedi Hasan, Hiroyuki Kurata
Author Information
  1. Sho Tsukiyama: Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
  2. Md Mehedi Hasan: Tulane Center for Aging and Department of Medicine, Tulane University Health Sciences Center, New Orleans, LA 70112, USA.
  3. Hiroyuki Kurata: Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.

Abstract

N6-methyladenine (6mA) plays a critical role in various epigenetic processing including DNA replication, DNA repair, silencing, transcription, and diseases such as cancer. To understand such epigenetic mechanisms, 6 mA has been detected by high-throughput technologies on a genome-wide scale at single-base resolution, together with conventional methods such as immunoprecipitation, mass spectrometry and capillary electrophoresis, but these experimental approaches are time-consuming and laborious. To complement these problems, we have developed a CNN-based 6 mA site predictor, named CNN6mA, which proposed two new architectures: a position-specific 1-D convolutional layer and a cross-interactive network. In the position-specific 1-D convolutional layer, position-specific filters with different window sizes were applied to an inquiry sequence instead of sharing the same filters over all positions in order to extract the position-specific features at different levels. The cross-interactive network explored the relationships between all the nucleotide patterns within the inquiry sequence. Consequently, CNN6mA outperformed the existing state-of-the-art models in many species and created the contribution score vector that intelligibly interpret the prediction mechanism. The source codes and web application in CNN6mA are freely accessible at https://github.com/kuratahiroyuki/CNN6mA.git and http://kurata35.bio.kyutech.ac.jp/CNN6mA/, respectively.

Keywords

References

  1. Nature. 2020 Jul;583(7817):625-630 [PMID: 32669713]
  2. Nat Rev Genet. 2016 Apr;17(4):195 [PMID: 26924764]
  3. Nucleic Acids Res. 2019 Jul 26;47(13):6753-6768 [PMID: 31334813]
  4. IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1184-1192 [PMID: 29993783]
  5. Brief Bioinform. 2021 May 20;22(3): [PMID: 32608476]
  6. Anal Chem. 2005 Jan 15;77(2):504-10 [PMID: 15649046]
  7. Bioinformatics. 2019 Sep 15;35(18):3287-3293 [PMID: 30726880]
  8. PLoS Comput Biol. 2021 Feb 18;17(2):e1008767 [PMID: 33600435]
  9. Bioinformatics. 2020 Jan 15;36(2):388-392 [PMID: 31297537]
  10. Comput Struct Biotechnol J. 2020 Nov 12;18:3528-3538 [PMID: 33304452]
  11. J Bacteriol. 2005 Oct;187(20):7027-37 [PMID: 16199573]
  12. J Chem Theory Comput. 2018 Nov 13;14(11):5499-5510 [PMID: 30252473]
  13. Cell. 1990 Sep 7;62(5):967-79 [PMID: 1697508]
  14. Front Genet. 2019 Oct 11;10:1071 [PMID: 31681441]
  15. Mol Cell. 2018 Jul 19;71(2):306-318.e7 [PMID: 30017583]
  16. PeerJ. 2021 Feb 3;9:e10813 [PMID: 33604189]
  17. Brief Bioinform. 2022 Mar 10;23(2): [PMID: 35225328]
  18. Electrophoresis. 2010 Oct;31(21):3548-51 [PMID: 20925053]
  19. Nucleic Acids Res. 2017 Jul 3;45(W1):W534-W538 [PMID: 28460012]
  20. Genetics. 1983 Aug;104(4):571-82 [PMID: 6225697]
  21. Genes (Basel). 2022 Apr 12;13(4): [PMID: 35456483]
  22. Bioinformatics. 2021 Oct 02;: [PMID: 34601568]
  23. Nat Commun. 2019 Jun 4;10(1):2449 [PMID: 31164644]
  24. Infect Immun. 2001 Dec;69(12):7197-204 [PMID: 11705888]
  25. Nat Methods. 2013 Dec;10(12):1211-2 [PMID: 24097270]
  26. Cell. 2014 Mar 27;157(1):95-109 [PMID: 24679529]
  27. Nat Commun. 2019 Apr 23;10(1):1869 [PMID: 31015479]
  28. J Biol Chem. 1992 Jun 15;267(17):12142-8 [PMID: 1601880]
  29. iScience. 2020 Apr 24;23(4):100991 [PMID: 32240948]
  30. Nat Genet. 2011 Oct 02;43(11):1091-7 [PMID: 21964573]
  31. Nat Methods. 2010 Jun;7(6):461-5 [PMID: 20453866]
  32. Nature. 2011 Dec 14;480(7378):490-5 [PMID: 22170606]
  33. Front Genet. 2021 Apr 29;12:668317 [PMID: 33995495]
  34. IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1793-1800 [PMID: 32960766]

Word Cloud

Created with Highcharts 10.0.0position-specificnetworkN6-methyladenine6mADNACNN6mAcross-interactivepredictionCNNepigenetic6 mAsite1-DconvolutionallayerfiltersdifferentinquirysequenceInterpretableneurallearningplayscriticalrolevariousprocessingincludingreplicationrepairsilencingtranscriptiondiseasescancerunderstandmechanismsdetectedhigh-throughputtechnologiesgenome-widescalesingle-baseresolutiontogetherconventionalmethodsimmunoprecipitationmassspectrometrycapillaryelectrophoresisexperimentalapproachestime-consuminglaboriouscomplementproblemsdevelopedCNN-basedpredictornamedproposedtwonewarchitectures:windowsizesappliedinsteadsharingpositionsorderextractfeatureslevelsexploredrelationshipsnucleotidepatternswithinConsequentlyoutperformedexistingstate-of-the-artmodelsmanyspeciescreatedcontributionscorevectorintelligiblyinterpretmechanismsourcecodeswebapplicationfreelyaccessiblehttps://githubcom/kuratahiroyuki/CNN6mAgithttp://kurata35biokyutechacjp/CNN6mA/respectivelyCNN6mA:modelbasedAUCsAreacurvesBERTBidirectionalEncoderRepresentationsTransformersConvolutionalmodificationDeepLSTMLongshort-termmemoryMCCMatthewscorrelationcoefficientMachineRFRandomforestSMRTSingle-moleculereal-timeSNSensitivitySPSpecificityUMAPUniformmanifoldapproximationprojectiont-SNEt-distributedstochasticneighborembedding

Similar Articles

Cited By