A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome.

Chowdhury Rafeed Rahman, Ruhul Amin, Swakkhar Shatabda, Md Sadrul Islam Toaha
Author Information
  1. Chowdhury Rafeed Rahman: United International University, Dhaka, Bangladesh.
  2. Ruhul Amin: United International University, Dhaka, Bangladesh.
  3. Swakkhar Shatabda: United International University, Dhaka, Bangladesh. swakkhar@cse.uiu.ac.bd.
  4. Md Sadrul Islam Toaha: United International University, Dhaka, Bangladesh.

Abstract

DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification responsible for many biological functions. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neural network (CNN) based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC (Pseudo Amino Acid Composition) inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves auROC (area under Receiver Operating Characteristic curve) score of 0.98 with an overall accuracy of 93.97% using fivefold cross validation on benchmark dataset. Finally, we evaluate our model on three other plant genome 6mA site identification test datasets. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. An algorithm for potential motif extraction and a feature importance analysis procedure are two by products of this research. Web tool for this research can be found at: https://cutt.ly/dgp3QTR .

References

  1. Bioinformatics. 2006 Jul 1;22(13):1658-9 [PMID: 16731699]
  2. Mol Biosyst. 2015 Oct;11(10):2620-34 [PMID: 26099739]
  3. Biopolymers. 1988 Nov;27(11):1795-815 [PMID: 3233332]
  4. Mol Cell. 2016 Jun 16;62(6):848-861 [PMID: 27237052]
  5. Int J Mol Sci. 2014 Nov 04;15(11):20072-8 [PMID: 25375190]
  6. Genes (Basel). 2020 Aug 05;11(8): [PMID: 32764497]
  7. Nat Plants. 2018 Aug;4(8):554-563 [PMID: 30061746]
  8. Comput Struct Biotechnol J. 2020 Apr 08;18:906-912 [PMID: 32322372]
  9. Bioinformatics. 2020 May 1;36(10):3257-3259 [PMID: 32091591]
  10. Bioinformatics. 2020 Jan 15;36(2):388-392 [PMID: 31297537]
  11. Genomics. 2019 May;111(3):457-464 [PMID: 29548799]
  12. Nat Methods. 2010 Jun;7(6):461-5 [PMID: 20453866]
  13. Front Plant Sci. 2020 Jan 31;11:4 [PMID: 32076430]
  14. Hortic Res. 2019 Jun 15;6:78 [PMID: 31240103]
  15. J Biol Chem. 1985 Jan 10;260(1):191-4 [PMID: 3880739]
  16. Mol Ther Nucleic Acids. 2019 Dec 6;18:131-141 [PMID: 31542696]
  17. Brief Bioinform. 2020 Sep 10;: [PMID: 32910169]
  18. Methods. 2009 Mar;47(3):142-50 [PMID: 18950712]
  19. Adv Exp Med Biol. 2016;945:213-246 [PMID: 27826841]
  20. Genes (Basel). 2019 Oct 20;10(10): [PMID: 31635172]
  21. Nat Rev Microbiol. 2006 Mar;4(3):183-92 [PMID: 16489347]
  22. Front Genet. 2019 Oct 11;10:1071 [PMID: 31681441]
  23. Electrophoresis. 2010 Oct;31(21):3548-51 [PMID: 20925053]
  24. BMC Genomics. 2020 Jan 2;21(1):6 [PMID: 31898477]
  25. PLoS One. 2017 Feb 3;12(2):e0171410 [PMID: 28158264]
  26. Front Genet. 2019 Sep 10;10:793 [PMID: 31552096]
  27. Anal Biochem. 2008 Feb 15;373(2):386-8 [PMID: 17976365]
  28. Genomics. 2019 Jan;111(1):96-102 [PMID: 29360500]
  29. Plant Mol Biol. 2020 May;103(1-2):225-234 [PMID: 32140819]
  30. Nat Methods. 2013 Dec;10(12):1211-2 [PMID: 24097270]
  31. Dev Cell. 2018 May 7;45(3):406-416.e3 [PMID: 29656930]
  32. Brief Funct Genomics. 2021 Jan 25;: [PMID: 33491072]
  33. Bioinformatics. 2019 Aug 15;35(16):2796-2800 [PMID: 30624619]
  34. Mol Ther Nucleic Acids. 2020 Sep 16;22:406-420 [PMID: 33230445]
  35. Curr Top Med Chem. 2016;16(4):381-2 [PMID: 26471864]
  36. Anal Biochem. 2014 Jul 1;456:53-60 [PMID: 24732113]
  37. Int J Biol Macromol. 2020 Aug 15;157:752-758 [PMID: 31805335]
  38. Anal Biochem. 2019 Jan 1;564-565:54-63 [PMID: 30339812]
  39. J Chem Inf Comput Sci. 2004 Jan-Feb;44(1):1-12 [PMID: 14741005]
  40. Biophys Chem. 1988 May;30(1):3-48 [PMID: 3046672]

MeSH Term

Adenine
Amino Acid Motifs
DNA Methylation
Epigenesis, Genetic
Epigenomics
Genome, Plant
Neural Networks, Computer
Oryza

Chemicals

Adenine
6-methyladenine

Word Cloud

Created with Highcharts 10.0.06mAtoolgenomeplantsiteidentificationDNAcomputationalcansitesgenomesbasedricemodelmultiplefeaturemotifextractionresearchN6-methylationAdeninenucleotidepostreplicationmodificationresponsiblemanybiologicalfunctionsAutomatedaccuratemethodshelpidentifylongsavingsignificanttimemoneystudydevelopsconvolutionalneuralnetworkCNNi6mA-CNNcapableidentifyingcoordinatesamongtypesfeaturesPseAACPseudoAminoAcidCompositioninspiredcustomizedvectoronehotrepresentationsdinucleotidephysicochemicalpropertiesachievesauROCareaReceiverOperatingCharacteristiccurvescore098overallaccuracy9397%usingfivefoldcrossvalidationbenchmarkdatasetFinallyevaluatethreetestdatasetsResultssuggestproposedablegeneralizeabilityirrespectivespeciesalgorithmpotentialimportanceanalysisproceduretwoproductsWebfoundat:https://cuttly/dgp3QTRconvolutionapproachtowardsN6-methyladenine

Similar Articles

Cited By