POSMM: an efficient alignment-free metagenomic profiler that complements alignment-based profiling.

David J Burks, Vaidehi Pusadkar, Rajeev K Azad
Author Information
  1. David J Burks: Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, 76203, USA.
  2. Vaidehi Pusadkar: Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, 76203, USA.
  3. Rajeev K Azad: Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, 76203, USA. Rajeev.Azad@unt.edu.

Abstract

We present here POSMM (pronounced 'Possum'), Python-Optimized Standard Markov Model classifier, which is a new incarnation of the Markov model approach to metagenomic sequence analysis. Built on the top of a rapid Markov model based classification algorithm SMM, POSMM reintroduces high sensitivity associated with alignment-free taxonomic classifiers to probe whole genome or metagenome datasets of increasingly prohibitive sizes. Logistic regression models generated and optimized using the Python sklearn library, transform Markov model probabilities to scores suitable for thresholding. Featuring a dynamic database-free approach, models are generated directly from genome fasta files per run, making POSMM a valuable accompaniment to many other programs. By combining POSMM with ultrafast classifiers such as Kraken2, their complementary strengths can be leveraged to produce higher overall accuracy in metagenomic sequence classification than by either as a standalone classifier. POSMM is a user-friendly and highly adaptable tool designed for broad use by the metagenome scientific community.

Keywords

References

  1. Nucleic Acids Res. 1998 Jan 15;26(2):544-8 [PMID: 9421513]
  2. Nat Methods. 2011 May;8(5):367 [PMID: 21527926]
  3. G3 (Bethesda). 2019 Oct 7;9(10):3273-3285 [PMID: 31387857]
  4. Nucleic Acids Res. 2016 Jan 4;44(D1):D1133-40 [PMID: 26553803]
  5. Bioinformatics. 2018 Sep 1;34(17):i884-i890 [PMID: 30423086]
  6. Genome Biol. 2018 Nov 16;19(1):198 [PMID: 30445993]
  7. BMC Genomics. 2015 Mar 25;16:236 [PMID: 25879410]
  8. Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21 [PMID: 21062823]
  9. Front Genet. 2019 Nov 21;10:1156 [PMID: 31824565]
  10. Nat Biotechnol. 2017 Sep 12;35(9):833-844 [PMID: 28898207]
  11. Nucleic Acids Res. 2013 Jan 7;41(1):e23 [PMID: 23036836]
  12. Sci Data. 2019 Nov 26;6(1):285 [PMID: 31772173]
  13. Genome Biol. 2019 Nov 28;20(1):257 [PMID: 31779668]
  14. PLoS One. 2016 Sep 28;11(9):e0163527 [PMID: 27683082]
  15. Bioinformatics. 2009 Jul 15;25(14):1754-60 [PMID: 19451168]
  16. Nat Commun. 2016 Apr 13;7:11257 [PMID: 27071849]
  17. Bioinformatics. 2015 Nov 15;31(22):3584-92 [PMID: 26209798]
  18. Pac Symp Biocomput. 2011;:165-76 [PMID: 21121044]
  19. Adv Bioinformatics. 2008;2008:205969 [PMID: 19956701]
  20. Front Microbiol. 2016 Apr 20;7:459 [PMID: 27148170]
  21. Curr Protoc Bioinformatics. 2017 Sep 13;59:3.4.1-3.4.24 [PMID: 28902395]
  22. Bioinformatics. 2020 Aug 15;36(14):4130-4136 [PMID: 32516355]
  23. Front Microbiol. 2016 Aug 03;7:1210 [PMID: 27536294]
  24. Nat Methods. 2009 Sep;6(9):673-6 [PMID: 19648916]
  25. Genome Biol. 2014 Mar 03;15(3):R46 [PMID: 24580807]
  26. Biol Proced Online. 2022 Nov 19;24(1):18 [PMID: 36402995]
  27. Genome Biol. 2017 Sep 21;18(1):182 [PMID: 28934964]
  28. Nat Methods. 2015 Jan;12(1):59-60 [PMID: 25402007]
  29. Surg Neurol Int. 2018 Aug 10;9:157 [PMID: 30159201]
  30. Mol Ecol Resour. 2014 Nov;14(6):1097-102 [PMID: 25187008]
  31. PLoS One. 2012;7(3):e32491 [PMID: 22403664]
  32. OMICS. 2016 Aug;20(8):470-9 [PMID: 27447888]
  33. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  34. Nature. 2000 May 18;405(6784):299-304 [PMID: 10830951]

Word Cloud

Created with Highcharts 10.0.0POSMMMarkovmodelmetagenomicclassificationclassifierapproachsequencealignment-freeclassifiersgenomemetagenomemodelsgeneratedpresentpronounced'Possum'Python-OptimizedStandardModelnewincarnationanalysisBuilttoprapidbasedalgorithmSMMreintroduceshighsensitivityassociatedtaxonomicprobewholedatasetsincreasinglyprohibitivesizesLogisticregressionoptimizedusingPythonsklearnlibrarytransformprobabilitiesscoressuitablethresholdingFeaturingdynamicdatabase-freedirectlyfastafilesperrunmakingvaluableaccompanimentmanyprogramscombiningultrafastKraken2complementarystrengthscanleveragedproducehigheroverallaccuracyeitherstandaloneuser-friendlyhighlyadaptabletooldesignedbroadusescientificcommunityPOSMM:efficientprofilercomplementsalignment-basedprofilingMetagenomesMicrobiomeSequencealignmentTaxonomic

Similar Articles

Cited By