Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species.

Zutan Li, Hangjin Jiang, Lingpeng Kong, Yuanyuan Chen, Kun Lang, Xiaodan Fan, Liangyun Zhang, Cong Pian
Author Information
  1. Zutan Li: Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China.
  2. Hangjin Jiang: Center for Data Science, Zhejiang University, Hangzhou, China. ORCID
  3. Lingpeng Kong: Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China. ORCID
  4. Yuanyuan Chen: Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China. ORCID
  5. Kun Lang: College of information science & Technology, Nanjing Agricultural University, Nanjing, China.
  6. Xiaodan Fan: Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China. ORCID
  7. Liangyun Zhang: Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China. ORCID
  8. Cong Pian: Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China. ORCID

Abstract

N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA's biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression.

References

  1. Sci Rep. 2019 Sep 11;9(1):13109 [PMID: 31511570]
  2. Mol Ther Nucleic Acids. 2019 Dec 6;18:131-141 [PMID: 31542696]
  3. Nat Biotechnol. 2015 Aug;33(8):831-8 [PMID: 26213851]
  4. Bioinformatics. 2019 Aug 15;35(16):2796-2800 [PMID: 30624619]
  5. Nat Plants. 2018 Aug;4(8):554-563 [PMID: 30061746]
  6. Bioinformatics. 2020 Jan 15;36(2):388-392 [PMID: 31297537]
  7. J Biol Chem. 1992 Jun 15;267(17):12142-8 [PMID: 1601880]
  8. Nat Methods. 2010 Jun;7(6):461-5 [PMID: 20453866]
  9. Hortic Res. 2019 Jun 15;6:78 [PMID: 31240103]
  10. Cell. 1990 Sep 7;62(5):967-79 [PMID: 1697508]
  11. Methods. 2009 Mar;47(3):142-50 [PMID: 18950712]
  12. Genes (Basel). 2019 Oct 20;10(10): [PMID: 31635172]
  13. Nat Rev Microbiol. 2006 Mar;4(3):183-92 [PMID: 16489347]
  14. Front Genet. 2019 Oct 11;10:1071 [PMID: 31681441]
  15. Mol Cell. 2018 Jul 19;71(2):306-318.e7 [PMID: 30017583]
  16. Electrophoresis. 2010 Oct;31(21):3548-51 [PMID: 20925053]
  17. Sci Rep. 2016 Oct 05;6:34820 [PMID: 27703231]
  18. Front Genet. 2019 Sep 10;10:793 [PMID: 31552096]
  19. Genetics. 1983 Aug;104(4):571-82 [PMID: 6225697]
  20. Cell. 2015 May 7;161(4):879-892 [PMID: 25936837]
  21. BMC Bioinformatics. 2010 Oct 12;11:505 [PMID: 20939873]
  22. Plant Mol Biol. 2020 May;103(1-2):225-234 [PMID: 32140819]
  23. Infect Immun. 2001 Dec;69(12):7197-204 [PMID: 11705888]
  24. Bioessays. 2006 Mar;28(3):309-15 [PMID: 16479578]
  25. Proc Int Conf Intell Syst Mol Biol. 1994;2:28-36 [PMID: 7584402]
  26. Nature. 1970 Mar 7;225(5236):948-9 [PMID: 4391887]
  27. Cell. 2015 May 7;161(4):710-3 [PMID: 25936836]
  28. Nat Methods. 2015 Oct;12(10):931-4 [PMID: 26301843]
  29. IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):677-691 [PMID: 27608449]
  30. Nat Struct Mol Biol. 2016 Jan;23(1):24-30 [PMID: 26689968]
  31. Cell. 2015 May 7;161(4):868-78 [PMID: 25936839]
  32. IEEE Trans Neural Netw Learn Syst. 2017 Oct;28(10):2222-2232 [PMID: 27411231]
  33. Genome Res. 2016 Jul;26(7):990-9 [PMID: 27197224]

MeSH Term

Adenine
Arabidopsis
Base Sequence
Binding Sites
Computational Biology
DNA
DNA Methylation
DNA, Plant
Databases, Nucleic Acid
Deep Learning
Fragaria
Neural Networks, Computer
Oryza
Rosa
Species Specificity

Chemicals

DNA, Plant
DNA
Adenine
6-methyladenine

Word Cloud

Created with Highcharts 10.0.06mAsitesDNApredictionN6-methyladeninebiologicalsequencedeeplearningframeworkDeep6mAriceaccuracyfindsimilarpatternsacrossdifferentspeciesmayimportantmodificationformassociatedwiderangeprocessesIdentifyingaccuratelygenomicscalecrucialunder-standing6mA'sfunctionsHoweverexistingexperimentaltechniquesdetectingcost-ineffectiveimpliesgreatneeddevelopingnewcomputationalmethodsproblempaperdevelopedwithoutrequiringpriorknowledgemanuallycraftedfeaturesnamedidentifyperformancesuperiortoolsSpecifically5-foldcross-validationbenchmarkdatasetgivessensitivityspecificity9296%9506%respectivelyoverall94%Importantlysequencessharemodeltraineddatapredictswellthreespecies:ArabidopsisthalianaFragariavescaRosachinensis90%addition1tendsoccurGAGGmotifsmeansnearsiteconservative2enrichedTATAboxpromotermainsourceregulatingdownstreamgeneexpressionDeep6mA:exploring

Similar Articles

Cited By