CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification.

Ritwika Das, Anil Rai, Dwijesh Chandra Mishra
Author Information
  1. Ritwika Das: Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
  2. Anil Rai: Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India. ORCID
  3. Dwijesh Chandra Mishra: Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India. ORCID

Abstract

fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, -mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets.

Keywords

References

  1. Front Microbiol. 2019 Jan 22;10:6 [PMID: 30740092]
  2. Adv Genet. 2017;100:49-72 [PMID: 29153404]
  3. Appl Environ Microbiol. 2009 Dec;75(23):7537-41 [PMID: 19801464]
  4. Appl Microbiol Biotechnol. 2014 Apr;98(8):3425-36 [PMID: 24522727]
  5. Sci Total Environ. 2022 Apr 1;815:152928 [PMID: 34999062]
  6. Sci Rep. 2020 Jul 28;10(1):12628 [PMID: 32724224]
  7. PLoS One. 2014 Apr 11;9(4):e93849 [PMID: 24728005]
  8. Annu Rev Entomol. 2009;54:323-42 [PMID: 19067635]
  9. PLoS One. 2012;7(7):e40863 [PMID: 22808280]
  10. Environ Pollut. 2018 Oct;241:212-233 [PMID: 29807281]
  11. Proc Natl Acad Sci U S A. 2012 Apr 17;109(16):6241-6 [PMID: 22454494]
  12. Future Microbiol. 2012 Jan;7(1):73-89 [PMID: 22191448]
  13. Mol Ecol. 2013 Nov;22(21):5271-7 [PMID: 24112409]
  14. Front Microbiol. 2020 Oct 16;11:556136 [PMID: 33178147]
  15. BMC Genet. 2019 Jan 7;20(1):2 [PMID: 30616524]
  16. Mycologia. 2016 Jan-Feb;108(1):1-5 [PMID: 26553774]
  17. BMC Genomics. 2020 Jan 2;21(1):6 [PMID: 31898477]
  18. Open Microbiol J. 2018 Jul 31;12:261-279 [PMID: 30197700]
  19. Genes (Basel). 2021 Sep 17;12(9): [PMID: 34573413]
  20. Appl Environ Microbiol. 2007 Aug;73(16):5261-7 [PMID: 17586664]
  21. BMC Res Notes. 2016 Aug 11;9(1):402 [PMID: 27516337]
  22. Front Cell Infect Microbiol. 2020 Nov 30;10:604923 [PMID: 33330142]
  23. ISME J. 2011 Sep;5(9):1414-25 [PMID: 21430787]
  24. Genes (Basel). 2021 Jun 30;12(7): [PMID: 34209356]
  25. Brief Bioinform. 2018 Nov 27;19(6):1415-1429 [PMID: 28481971]
  26. Protein Cell. 2021 May;12(5):315-330 [PMID: 32394199]
  27. Trends Plant Sci. 2012 Aug;17(8):478-86 [PMID: 22564542]
  28. Biol Proced Online. 2022 Nov 19;24(1):18 [PMID: 36402995]
  29. Proc Natl Acad Sci U S A. 1977 Nov;74(11):5088-90 [PMID: 270744]
  30. Front Plant Sci. 2013 Apr 10;4:81 [PMID: 23596447]
  31. ISME J. 2012 Feb;6(2):343-51 [PMID: 21900968]
  32. Cardiol Clin. 1984 Aug;2(3):319-28 [PMID: 6399868]
  33. PLoS Pathog. 2014 Mar 13;10(3):e1003950 [PMID: 24626260]
  34. AAPS J. 2018 Mar 30;20(3):58 [PMID: 29603063]
  35. Science. 2017 May 26;356(6340): [PMID: 28546156]
  36. Biol Rev Camb Philos Soc. 2019 Dec;94(6):2101-2137 [PMID: 31659870]
  37. Front Microbiol. 2021 Feb 25;12:628379 [PMID: 33717018]
  38. Curr Opin Biotechnol. 2020 Apr;62:181-188 [PMID: 31790876]
  39. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402 [PMID: 9254694]
  40. Sensors (Basel). 2017 Feb 22;17(2): [PMID: 28241451]
  41. PLoS One. 2014 Apr 08;9(4):e93827 [PMID: 24714158]
  42. Mol Ecol Notes. 2007 May 1;7(3):355-364 [PMID: 18784790]
  43. Pest Manag Sci. 2003 Feb;59(2):129-42 [PMID: 12587866]
  44. BMC Bioinformatics. 2018 Jul 9;19(Suppl 7):198 [PMID: 30066629]
  45. Mar Drugs. 2016 Jul 21;14(7): [PMID: 27455283]
  46. Clin Chem. 2021 Dec 30;68(1):115-124 [PMID: 34969106]
  47. New Phytol. 2015 Mar;205(4):1443-1447 [PMID: 25524234]
  48. Syst Appl Microbiol. 2009 Dec;32(8):533-42 [PMID: 19819658]
  49. Brief Bioinform. 2019 Jul 19;20(4):1125-1136 [PMID: 29028872]
  50. PeerJ. 2020 Feb 17;8:e8523 [PMID: 32110484]
  51. Int J Environ Res Public Health. 2018 Mar 23;15(4): [PMID: 29570619]

MeSH Term

Bayes Theorem
Algorithms
Software
Neural Networks, Computer
Fungi

Word Cloud

Created with Highcharts 10.0.0ITStaxonomyclassificationdatasetssequencesfungistudyCNNspeciesmetagenomichighlypredictionlearningalgorithmslargemachinepresentCNN_FunBarsizefrequencymodelslevelscanSVMKNNNaïve-BayesfungalFungalidentificationdatachallengingtaskInternalTranscribedSpacerregionpotentialDNAmarkerComputationalapproachesespeciallydeepefficientbetterpatternrecognitioncomparedsilicotechniquesBLASTmethodsconvolutionalneuralnetwork-basedapproachUNITE+INSDCreferenceEffectsconvolutionkernelfilternumbers-merdegreediversitycategory-wiseperformancesassessedtaxonomicgenusfamilyorderclassphylumobservedproduce>93%averageaccuracyclassifyingbalanced500percategory6-merfeaturescomparativerevealedoutperformlearning-basedRandomForestwellexistingsoftwarefunbarRFMothurRDPClassifierSINTAXwillhelpfulusingCNN_FunBar:AdvancedLearningTechniqueFungiRegionClassificationUNITEk-merrandomforesttopsis

Similar Articles

Cited By (2)