ENet-6mA: Identification of 6mA Modification Sites in Plant Genomes Using ElasticNet and Neural Networks.

Zeeshan Abbas, Hilal Tayara, Kil To Chong
Author Information
  1. Zeeshan Abbas: Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea. ORCID
  2. Hilal Tayara: School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Korea. ORCID
  3. Kil To Chong: Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea. ORCID

Abstract

N6-methyladenine (6mA) has been recognized as a key epigenetic alteration that affects a variety of biological activities. Precise prediction of 6mA modification sites is essential for understanding the logical consistency of biological activity. There are various experimental methods for identifying 6mA modification sites, but in silico prediction has emerged as a potential option due to the very high cost and labor-intensive nature of experimental procedures. Taking this into consideration, developing an efficient and accurate model for identifying N6-methyladenine is one of the top objectives in the field of bioinformatics. Therefore, we have created an in silico model for the classification of 6mA modifications in plant genomes. ENet-6mA uses three encoding methods, including one-hot, nucleotide chemical properties (NCP), and electron-ion interaction potential (EIIP), which are concatenated and fed as input to ElasticNet for feature reduction, and then the optimized features are given directly to the neural network to get classified. We used a benchmark dataset of rice for five-fold cross-validation testing and three other datasets from plant genomes for cross-species testing purposes. The results show that the model can predict the N6-methyladenine sites very well, even cross-species. Additionally, we separated the datasets into different ratios and calculated the performance using the area under the precision-recall curve (AUPRC), achieving 0.81, 0.79, and 0.50 with 1:10 (positive:negative) samples for , , and , respectively.

Keywords

References

  1. Front Genet. 2019 Oct 11;10:1071 [PMID: 31681441]
  2. Microbiol Mol Biol Rev. 2006 Sep;70(3):830-56 [PMID: 16959970]
  3. Cell. 2015 May 7;161(4):879-892 [PMID: 25936837]
  4. Dev Cell. 2018 May 7;45(3):406-416.e3 [PMID: 29656930]
  5. Hum Brain Mapp. 2016 May;37(5):1920-9 [PMID: 26915458]
  6. Bioinformatics. 2019 Aug 15;35(16):2796-2800 [PMID: 30624619]
  7. Nat Genet. 2018 Jun;50(6):772-777 [PMID: 29713014]
  8. Nat Genet. 2017 Jun;49(6):964-968 [PMID: 28481340]
  9. Bioinformatics. 2020 Jan 15;36(2):388-392 [PMID: 31297537]
  10. Mol Plant. 2020 Jan 6;13(1):14-30 [PMID: 31863849]
  11. Neuroimage. 2018 Feb 15;167:62-72 [PMID: 29155080]
  12. Mol Plant. 2018 Dec 3;11(12):1492-1508 [PMID: 30448535]
  13. Gigascience. 2018 Feb 1;7(2):1-7 [PMID: 29253147]
  14. Methods. 2009 Mar;47(3):142-50 [PMID: 18950712]
  15. Adv Exp Med Biol. 2016;945:213-246 [PMID: 27826841]
  16. Electrophoresis. 2010 Oct;31(21):3548-51 [PMID: 20925053]
  17. Front Genet. 2019 Sep 10;10:793 [PMID: 31552096]
  18. Comput Struct Biotechnol J. 2021 Nov 01;19:6009-6019 [PMID: 34849205]
  19. Cell. 2019 Jun 13;177(7):1781-1796.e25 [PMID: 31104845]
  20. Bioinformatics. 2018 Sep 15;34(18):3086-3093 [PMID: 29684124]
  21. Neuroimage. 2017 May 15;152:476-481 [PMID: 28315741]
  22. Genes (Basel). 2020 Aug 05;11(8): [PMID: 32764497]
  23. Cell. 2015 May 7;161(4):893-906 [PMID: 25936838]
  24. J Am Chem Soc. 2017 Oct 18;139(41):14436-14442 [PMID: 28933854]
  25. Cell. 2015 May 7;161(4):868-78 [PMID: 25936839]
  26. Dis Colon Rectum. 2006 Jul;49(7):939-44 [PMID: 16741596]
  27. Bioinformatics. 2017 Nov 15;33(22):3518-3523 [PMID: 28961687]
  28. PLoS One. 2015 Mar 04;10(3):e0118432 [PMID: 25738806]
  29. IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2533-2544 [PMID: 34038365]
  30. Biosystems. 1990;23(4):311-6 [PMID: 2322643]
  31. Comput Struct Biotechnol J. 2021 Aug 10;19:4619-4625 [PMID: 34471503]

Grants

  1. 2020R1A2C2005612/National Research Foundation

MeSH Term

Computational Biology
DNA Methylation
Genome, Plant
Neural Networks, Computer
Oryza

Word Cloud

Created with Highcharts 10.0.06mAN6-methyladeninesitesmodelElasticNet0biologicalpredictionmodificationexperimentalmethodsidentifyingsilicopotentialbioinformaticsplantgenomesthreeneuraltestingdatasetscross-speciesrecognizedkeyepigeneticalterationaffectsvarietyactivitiesPreciseessentialunderstandinglogicalconsistencyactivityvariousemergedoptionduehighcostlabor-intensivenatureproceduresTakingconsiderationdevelopingefficientaccurateonetopobjectivesfieldThereforecreatedclassificationmodificationsENet-6mAusesencodingincludingone-hotnucleotidechemicalpropertiesNCPelectron-ioninteractionEIIPconcatenatedfedinputfeaturereductionoptimizedfeaturesgivendirectlynetworkgetclassifiedusedbenchmarkdatasetricefive-foldcross-validationpurposesresultsshowcanpredictwellevenAdditionallyseparateddifferentratioscalculatedperformanceusingareaprecision-recallcurveAUPRCachieving8179501:10positive:negativesamplesrespectivelyENet-6mA:IdentificationModificationSitesPlantGenomesUsingNeuralNetworksDNAmethylationepigenomeengineeringepigenomicsnetworks

Similar Articles

Cited By