Combining comparative genomic analysis with machine learning reveals some promising diagnostic markers to identify five common pathogenic non-tuberculous mycobacteria.

Xinmiao Jia, Linfang Yang, Cuidan Li, Yingchun Xu, Qiwen Yang, Fei Chen
Author Information
  1. Xinmiao Jia: Medical Research Center, State Key laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Beijing, 100730, China. ORCID
  2. Linfang Yang: Departments of Dermatology, Affiliated Xingtai People's Hospital of Hebei Medical University, Xingtai, Hebei, 054001, China.
  3. Cuidan Li: CAS Key Laboratory of Genome Sciences & Information, China National Center for Bioinformation, Chinese Academy of Sciences, Beijing Institute of Genomics, Beijing, 100101, China.
  4. Yingchun Xu: Department of Clinical Laboratory, State Key laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, 100730, China.
  5. Qiwen Yang: Department of Clinical Laboratory, State Key laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, 100730, China.
  6. Fei Chen: CAS Key Laboratory of Genome Sciences & Information, China National Center for Bioinformation, Chinese Academy of Sciences, Beijing Institute of Genomics, Beijing, 100101, China.

Abstract

Non-tuberculous mycobacteria (NTM) can cause various respiratory diseases and even death in severe cases, and its incidence has increased rapidly worldwide. To date, it's difficult to use routine diagnostic methods and strain identification to precisely diagnose various types of NTM infections. We combined systematic comparative genomics with machine learning to select new diagnostic markers for precisely identifying five common pathogenic NTMs (Mycobacterium kansasii, Mycobacterium avium, Mycobacterium intracellular, Mycobacterium chelonae, Mycobacterium abscessus). A panel including six genes and two SNPs (nikA, benM, codA, pfkA2, mpr, yjcH, rrl C2638T, rrl A1173G) was selected to simultaneously identify the five NTMs with high accuracy (> 90%). Notably, the panel only containing the six genes also showed a good classification effect (accuracy > 90%). Additionally, the two panels could precisely differentiate the five NTMs from M. tuberculosis (accuracy > 99%). We also revealed some new marker genes/SNPs/combinations to accurately discriminate any one of the five NTMs separately, which provided the possibility to diagnose one certain NTM infection precisely. Our research not only reveals novel promising diagnostic markers to promote the development of precision diagnosis in NTM infectious, but also provides an insight into precisely identifying various genetically close pathogens through comparative genomics and machine learning.

References

  1. BMC Microbiol. 2008 Mar 20;8:48 [PMID: 18366704]
  2. Genome Biol Evol. 2017 Sep 1;9(9):2403-2417 [PMID: 28957464]
  3. Bioinformatics. 2015 Nov 15;31(22):3691-3 [PMID: 26198102]
  4. Lung India. 2020 Nov-Dec;37(6):495-500 [PMID: 33154211]
  5. Nat Rev Microbiol. 2018 Apr;16(4):202-213 [PMID: 29456241]
  6. Front Cell Infect Microbiol. 2017 Mar 21;7:88 [PMID: 28377903]
  7. Curr Microbiol. 2019 Jul;76(7):791-798 [PMID: 31073733]
  8. Anal Chim Acta. 2020 Sep 22;1131:146-155 [PMID: 32928475]
  9. Bioinformatics. 2009 Sep 1;25(17):2283-5 [PMID: 19542151]
  10. Microb Biotechnol. 2021 Jul;14(4):1539-1549 [PMID: 34019733]
  11. Bioinformatics. 2009 Aug 15;25(16):2078-9 [PMID: 19505943]
  12. Bioinformatics. 2009 Jul 15;25(14):1754-60 [PMID: 19451168]
  13. J Mol Biol. 2001 Dec 14;314(5):1041-52 [PMID: 11743721]
  14. J Clin Pathol. 2018 Sep;71(9):774-780 [PMID: 29559518]
  15. Lancet. 2013 May 4;381(9877):1551-60 [PMID: 23541540]
  16. Bioinformatics. 2018 Jul 15;34(14):2490-2492 [PMID: 29506019]
  17. F1000Res. 2016 Nov 30;5:2797 [PMID: 27990278]
  18. PLoS One. 2014 Oct 16;9(10):e109736 [PMID: 25330201]
  19. Am J Respir Crit Care Med. 2014 Apr 15;189(8):894-8 [PMID: 24735031]
  20. Cell. 2020 Apr 2;181(1):92-101 [PMID: 32243801]
  21. Eur J Intern Med. 2015 May;26(4):279-84 [PMID: 25784643]
  22. J Thorac Dis. 2014 Mar;6(3):210-20 [PMID: 24624285]
  23. Am J Respir Crit Care Med. 2010 Oct 1;182(7):977-82 [PMID: 20508209]
  24. Nucleic Acids Res. 2002 Jun 1;30(11):2478-83 [PMID: 12034836]
  25. J Korean Med Sci. 2016 May;31(5):649-59 [PMID: 27134484]
  26. Ann Am Thorac Soc. 2020 Feb;17(2):178-185 [PMID: 31830805]
  27. Bioinformatics. 2014 May 1;30(9):1297-9 [PMID: 24420766]
  28. Bioinformatics. 2014 Jul 15;30(14):2068-9 [PMID: 24642063]
  29. Mol Biol Evol. 2009 Jul;26(7):1641-50 [PMID: 19377059]
  30. Am J Respir Crit Care Med. 2012 Jan 15;185(2):231-2 [PMID: 22246710]
  31. BMC Genomics. 2016 Feb 17;17:118 [PMID: 26884275]

Grants

  1. 2018ZX10302301-004-003/Special Projects of Infectious Diseases for National Key Research and Development Program of China
  2. 32000463/National Natural Science Foundation of China
  3. pumch201912054/Research Fund of Peking Union Medical College Hospital

MeSH Term

Genomics
Humans
Machine Learning
Mycobacterium Infections, Nontuberculous
Nontuberculous Mycobacteria

Word Cloud

Created with Highcharts 10.0.0preciselyfiveMycobacteriumNTMdiagnosticNTMsvariouscomparativemachinelearningmarkersaccuracyalsomycobacteriadiagnosegenomicsnewidentifyingcommonpathogenicpanelsixgenestworrlidentify> 90%onerevealspromisingNon-tuberculouscancauserespiratorydiseasesevendeathseverecasesincidenceincreasedrapidlyworldwidedatedifficultuseroutinemethodsstrainidentificationtypesinfectionscombinedsystematicselectkansasiiaviumintracellularchelonaeabscessusincludingSNPsnikAbenMcodApfkA2mpryjcHC2638TA1173GselectedsimultaneouslyhighNotablycontainingshowedgoodclassificationeffectAdditionallypanelsdifferentiateMtuberculosis> 99%revealedmarkergenes/SNPs/combinationsaccuratelydiscriminateseparatelyprovidedpossibilitycertaininfectionresearchnovelpromotedevelopmentprecisiondiagnosisinfectiousprovidesinsightgeneticallyclosepathogensCombininggenomicanalysisnon-tuberculous

Similar Articles

Cited By (6)