i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting.

Zhixia Teng, Zhengnan Zhao, Yanjuan Li, Zhen Tian, Maozu Guo, Qianzi Lu, Guohua Wang
Author Information
  1. Zhixia Teng: College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.
  2. Zhengnan Zhao: College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.
  3. Yanjuan Li: College of Electrical and Information Engineering, Quzhou University, Quzhou, China.
  4. Zhen Tian: College of Information Engineering, Zhengzhou University, Zhengzhou, China.
  5. Maozu Guo: College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.
  6. Qianzi Lu: College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.
  7. Guohua Wang: College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.

Abstract

DNA N6-Methyladenine (6mA) is a common epigenetic modification, which plays some significant roles in the growth and development of plants. It is crucial to identify 6mA sites for elucidating the functions of 6mA. In this article, a novel model named i6mA-vote is developed to predict 6mA sites of plants. Firstly, DNA sequences were coded into six feature vectors with diverse strategies based on density, physicochemical properties, and position of nucleotides, respectively. To find the best coding strategy, the feature vectors were compared on several machine learning classifiers. The results suggested that the position of nucleotides has a significant positive effect on 6mA sites identification. Thus, the dinucleotide one-hot strategy which can describe position characteristics of nucleotides well was employed to extract DNA features in our method. Secondly, DNA sequences of Rosaceae were divided into a training dataset and a test dataset randomly. Finally, i6mA-vote was constructed by combining five different base-classifiers under a majority voting strategy and trained on the Rosaceae training dataset. The i6mA-vote was evaluated on the task of predicting 6mA sites from the genome of the Rosaceae, Rice, and Arabidopsis separately. In Rosaceae, the performances of i6mA-vote were 0.955 on accuracy (ACC), 0.909 on Matthew correlation coefficients (MCC), 0.955 on sensitivity (SN), and 0.954 on specificity (SP). Those indicators, in the order of ACC, MCC, SN, SP, were 0.882, 0.774, 0.961, and 0.803 on Rice while they were 0.798, 0.617, 0.666, and 0.929 on Arabidopsis. According to the indicators, our method was effectiveness and better than other concerned methods. The results also illustrated that i6mA-vote does not only well in 6mA sites prediction of intraspecies but also interspecies plants. Moreover, it can be seen that the specificity is distinctly lower than the sensitivity in Rice while it is just the opposite in Arabidopsis. It may be resulted from sequence similarity among Rosaceae, Rice and Arabidopsis.

Keywords

References

  1. Bioinformatics. 2006 Jul 1;22(13):1658-9 [PMID: 16731699]
  2. Curr Opin Microbiol. 2013 Apr;16(2):192-8 [PMID: 23434113]
  3. Nucleic Acids Res. 2002 Jan 1;30(1):207-10 [PMID: 11752295]
  4. Genes (Basel). 2019 Oct 20;10(10): [PMID: 31635172]
  5. Mol Genet Genomics. 2019 Oct;294(5):1173-1182 [PMID: 31055655]
  6. Bioinformatics. 2019 Aug 15;35(16):2796-2800 [PMID: 30624619]
  7. Brief Bioinform. 2021 May 20;22(3): [PMID: 32892224]
  8. Genomics. 2021 Jan;113(1 Pt 2):582-592 [PMID: 33010390]
  9. Bioinformatics. 2020 Jan 15;36(2):388-392 [PMID: 31297537]
  10. Nucleic Acids Res. 2017 Jan 4;45(D1):D85-D89 [PMID: 27924023]
  11. Mol Ther Nucleic Acids. 2019 Jun 7;16:733-744 [PMID: 31146255]
  12. Bioinformatics. 2017 Nov 15;33(22):3518-3523 [PMID: 28961687]
  13. Hortic Res. 2019 Jun 15;6:78 [PMID: 31240103]
  14. Nature. 1968 Jun 15;218(5146):1066-7 [PMID: 5656625]
  15. Nat Rev Microbiol. 2006 Mar;4(3):183-92 [PMID: 16489347]
  16. Front Genet. 2019 Oct 11;10:1071 [PMID: 31681441]
  17. Comput Math Methods Med. 2021 Jan 7;2021:6664362 [PMID: 33505515]
  18. Front Genet. 2019 Sep 10;10:793 [PMID: 31552096]
  19. Genomics. 2019 Jan;111(1):96-102 [PMID: 29360500]
  20. Cell. 2015 May 7;161(4):879-892 [PMID: 25936837]
  21. Plant Mol Biol. 2020 May;103(1-2):225-234 [PMID: 32140819]
  22. Bioinformatics. 2006 Jun 15;22(12):1536-7 [PMID: 16632492]
  23. Brief Bioinform. 2021 May 20;22(3): [PMID: 32910169]
  24. Bioinformatics. 2020 May 1;36(10):3257-3259 [PMID: 32091591]
  25. Bioinformatics. 2019 Dec 1;35(23):4930-4937 [PMID: 31099381]
  26. Nucleic Acids Res. 1990 Oct 25;18(20):6097-100 [PMID: 2172928]
  27. Cell. 2015 May 7;161(4):893-906 [PMID: 25936838]
  28. Cell. 2015 May 7;161(4):868-78 [PMID: 25936839]

Word Cloud

Created with Highcharts 10.0.006mADNAsitesi6mA-voteRosaceaeRiceArabidopsisplantsfeaturepositionnucleotidesstrategydatasetN6-Methyladeninesignificantsequencesvectorslearningresultscanwellmethodtraining955ACCMCCsensitivitySNspecificitySPindicatorsalsocommonepigeneticmodificationplaysrolesgrowthdevelopmentcrucialidentifyelucidatingfunctionsarticlenovelmodelnameddevelopedpredictFirstlycodedsixdiversestrategiesbaseddensityphysicochemicalpropertiesrespectivelyfindbestcodingcomparedseveralmachineclassifierssuggestedpositiveeffectidentificationThusdinucleotideone-hotdescribecharacteristicsemployedextractfeaturesSecondlydividedtestrandomlyFinallyconstructedcombiningfivedifferentbase-classifiersmajorityvotingtrainedevaluatedtaskpredictinggenomeseparatelyperformancesaccuracy909Matthewcorrelationcoefficients954order882774961803798617666929AccordingeffectivenessbetterconcernedmethodsillustratedpredictionintraspeciesinterspeciesMoreoverseendistinctlylowerjustoppositemayresultedsequencesimilarityamongi6mA-Vote:Cross-SpeciesIdentificationSitesPlantGenomesBasedEnsembleLearningVotingN6-methyladeninecross-speciesensembleencodingplantgenomes

Similar Articles

Cited By