N6-methyladenine identification using deep learning and discriminative feature integration.

Salman Khan, Islam Uddin, Sumaiya Noor, Salman A AlQahtani, Nijad Ahmad
Author Information
  1. Salman Khan: Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
  2. Islam Uddin: Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
  3. Sumaiya Noor: Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA.
  4. Salman A AlQahtani: Department of Computer Engineering, New Emerging Technologies and 5g Network and Beyond Research Chair, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.
  5. Nijad Ahmad: Department of Computer Science, Khurasan University, Jalalabad, Afghanistan. Nijad@khurasan.edu.af.

Abstract

N6-methyladenine (6 mA) is a pivotal DNA modification that plays a crucial role in epigenetic regulation, gene expression, and various biological processes. With advancements in sequencing technologies and computational biology, there is an increasing focus on developing accurate methods for 6 mA site identification to enhance early detection and understand its biological significance. Despite the rapid progress of machine learning in bioinformatics, accurately detecting 6 mA sites remains a challenge due to the limited generalizability and efficiency of existing approaches. In this study, we present Deep-N6mA, a novel Deep Neural Network (DNN) model incorporating optimal hybrid features for precise 6 mA site identification. The proposed framework captures complex patterns from DNA sequences through a comprehensive feature extraction process, leveraging k-mer, Dinucleotide-based Cross Covariance (DCC), Trinucleotide-based Auto Covariance (TAC), Pseudo Single Nucleotide Composition (PseSNC), Pseudo Dinucleotide Composition (PseDNC), and Pseudo Trinucleotide Composition (PseTNC). To optimize computational efficiency and eliminate irrelevant or noisy features, an unsupervised Principal Component Analysis (PCA) algorithm is employed, ensuring the selection of the most informative features. A multilayer DNN serves as the classification algorithm to identify N6-methyladenine sites accurately. The robustness and generalizability of Deep-N6mA were rigorously validated using fivefold cross-validation on two benchmark datasets. Experimental results reveal that Deep-N6mA achieves an average accuracy of 97.70% on the F. vesca dataset and 95.75% on the R. chinensis dataset, outperforming existing methods by 4.12% and 4.55%, respectively. These findings underscore the effectiveness of Deep-N6mA as a reliable tool for early 6 mA site detection, contributing to epigenetic research and advancing the field of computational biology.

Keywords

References

  1. J Chem Inf Model. 2015 Feb 23;55(2):263-74 [PMID: 25635324]
  2. Int J Mol Sci. 2022 Jul 27;23(15): [PMID: 35955447]
  3. Curr Top Med Chem. 2016;16(6):581-90 [PMID: 26286215]
  4. Anal Biochem. 2008 Feb 15;373(2):386-8 [PMID: 17976365]
  5. Genes (Basel). 2020 Aug 05;11(8): [PMID: 32764497]
  6. Bioinformatics. 2019 Aug 15;35(16):2796-2800 [PMID: 30624619]
  7. Comput Biol Chem. 2018 Apr;73:159-170 [PMID: 29486390]
  8. PLoS Comput Biol. 2021 Feb 18;17(2):e1008767 [PMID: 33600435]
  9. Front Genet. 2020 Sep 17;11:539227 [PMID: 33093842]
  10. Sci Rep. 2024 Sep 6;14(1):20819 [PMID: 39242695]
  11. Comput Biol Med. 2019 Jun;109:85-90 [PMID: 31048129]
  12. BMC Bioinformatics. 2024 Nov 19;25(1):360 [PMID: 39563239]
  13. Hortic Res. 2019 Jun 15;6:78 [PMID: 31240103]
  14. Mol Biosyst. 2015 Oct;11(10):2620-34 [PMID: 26099739]
  15. BioData Min. 2025 Feb 3;18(1):12 [PMID: 39901279]
  16. Bioinformatics. 2005 Jan 1;21(1):10-9 [PMID: 15308540]
  17. Front Genet. 2019 Oct 11;10:1071 [PMID: 31681441]
  18. Plant Mol Biol. 2020 May;103(1-2):225-234 [PMID: 32140819]
  19. BMC Bioinformatics. 2025 Mar 22;26(1):88 [PMID: 40121399]
  20. Bioinformatics. 2006 Jun 15;22(12):1536-7 [PMID: 16632492]
  21. Sci Rep. 2024 Apr 20;14(1):9116 [PMID: 38643305]
  22. Genomics. 2021 Jan;113(1 Pt 2):582-592 [PMID: 33010390]

MeSH Term

Deep Learning
Adenine
DNA Methylation
Computational Biology
Humans
Principal Component Analysis
Algorithms

Chemicals

Adenine

Word Cloud

Created with Highcharts 10.0.06 mAN6-methyladenineDNADeep-N6mAcomputationalsiteidentificationDeepfeaturesPseudoCompositionepigeneticbiologicalbiologymethodsearlydetectionlearningaccuratelysitesgeneralizabilityefficiencyexistingNeuralNetworkDNNfeatureCovarianceAnalysisalgorithmusingdataset4pivotalmodificationplayscrucialroleregulationgeneexpressionvariousprocessesadvancementssequencingtechnologiesincreasingfocusdevelopingaccurateenhanceunderstandsignificanceDespiterapidprogressmachinebioinformaticsdetectingremainschallengeduelimitedapproachesstudypresentnovelmodelincorporatingoptimalhybridpreciseproposedframeworkcapturescomplexpatternssequencescomprehensiveextractionprocessleveragingk-merDinucleotide-basedCrossDCCTrinucleotide-basedAutoTACSingleNucleotidePseSNCDinucleotidePseDNCTrinucleotidePseTNCoptimizeeliminateirrelevantnoisyunsupervisedPrincipalComponentPCAemployedensuringselectioninformativemultilayerservesclassificationidentifyrobustnessrigorouslyvalidatedfivefoldcross-validationtwobenchmarkdatasetsExperimentalresultsrevealachievesaverageaccuracy9770%Fvesca9575%Rchinensisoutperforming12%55%respectivelyfindingsunderscoreeffectivenessreliabletoolcontributingresearchadvancingfielddeepdiscriminativeintegrationMethylationDetectionModificationsLearningEpigeneticsSequence

Similar Articles

Cited By