Identification of host-microbe interaction factors in the genomes of soft rot-associated pathogens Dickeya dadantii 3937 and Pectobacterium carotovorum WPP14 with supervised machine learning.

Bing Ma, Amy O Charkowski, Jeremy D Glasner, Nicole T Perna
Author Information
  1. Bing Ma: Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI 53706, USA. bma@som.umaryland.edu.

Abstract

BACKGROUND: A wealth of genome sequences has provided thousands of genes of unknown function, but identification of functions for the large numbers of hypothetical genes in phytopathogens remains a challenge that impacts all research on plant-microbe interactions. Decades of research on the molecular basis of pathogenesis focused on a limited number of factors associated with long-known host-microbe interaction systems, providing limited direction into this challenge. Computational approaches to identify virulence genes often rely on two strategies: searching for sequence similarity to known host-microbe interaction factors from other organisms, and identifying islands of genes that discriminate between pathogens of one type and closely related non-pathogens or pathogens of a different type. The former is limited to known genes, excluding vast collections of genes of unknown function found in every genome. The latter lacks specificity, since many genes in genomic islands have little to do with host-interaction.
RESULT: In this study, we developed a supervised machine learning approach that was designed to recognize patterns from large and disparate data types, in order to identify candidate host-microbe interaction factors. The soft rot Enterobacteriaceae strains Dickeya dadantii 3937 and Pectobacterium carotovorum WPP14 were used for development of this tool, because these pathogens are important on multiple high value crops in agriculture worldwide and more genomic and functional data is available for the Enterobacteriaceae than any other microbial family. Our approach achieved greater than 90% precision and a recall rate over 80% in 10-fold cross validation tests.
CONCLUSION: Application of the learning scheme to the complete genome of these two organisms generated a list of roughly 200 candidates, many of which were previously not implicated in plant-microbe interaction and many of which are of completely unknown function. These lists provide new targets for experimental validation and further characterization, and our approach presents a promising pattern-learning scheme that can be generalized to create a resource to study host-microbe interactions in other bacterial phytopathogens.

References

  1. Genome Res. 1998 Nov;8(11):1154-71 [PMID: 9847079]
  2. J Bacteriol. 2010 Jun;192(11):2936-7 [PMID: 20348253]
  3. J Bacteriol. 2008 Nov;190(22):7508-22 [PMID: 18790868]
  4. Int J Approx Reason. 2008 Jan;47(1):17-36 [PMID: 19079753]
  5. Phytopathology. 2007 Sep;97(9):1150-63 [PMID: 18944180]
  6. Mol Plant Microbe Interact. 2002 May;15(5):472-80 [PMID: 12036278]
  7. Mol Plant Pathol. 2003 Jan 1;4(1):17-30 [PMID: 20569359]
  8. Nucleic Acids Res. 1982 Nov 25;10(22):7055-74 [PMID: 6760125]
  9. Science. 2004 Apr 2;304(5667):66-74 [PMID: 15001713]
  10. J Bacteriol. 2004 Jan;186(1):110-21 [PMID: 14679231]
  11. Nucleic Acids Res. 2002 Jan 1;30(1):239-41 [PMID: 11752304]
  12. Bioinformatics. 2010 Jul 1;26(13):1608-15 [PMID: 20472543]
  13. J Chem Inf Comput Sci. 1997 May-Jun;37(3):417-24 [PMID: 9177000]
  14. Microbiology (Reading). 2000 May;146 ( Pt 5):1053-1060 [PMID: 10832632]
  15. Infect Immun. 1997 May;65(5):1701-9 [PMID: 9125550]
  16. Bioinformatics. 2006 Nov 15;22(22):2753-60 [PMID: 16966363]
  17. Nucleic Acids Res. 1987 Feb 11;15(3):1281-95 [PMID: 3547335]
  18. J Bacteriol. 2007 Oct;189(19):7053-61 [PMID: 17644603]
  19. J Bacteriol. 1989 Sep;171(9):5127-34 [PMID: 2670904]
  20. Mol Microbiol. 2002 Feb;43(3):733-48 [PMID: 11929528]
  21. J Bacteriol. 2003 Aug;185(16):4973-82 [PMID: 12897017]
  22. Mol Plant Microbe Interact. 2008 Dec;21(12):1549-60 [PMID: 18986251]
  23. Mol Microbiol. 1994 Jul;13(1):133-40 [PMID: 7984087]
  24. Bioinformatics. 2009 Mar 1;25(5):664-5 [PMID: 19151094]
  25. BMC Genomics. 2006 Mar 20;7:57 [PMID: 16549033]
  26. Environ Microbiol. 2014 Mar;16(3):734-45 [PMID: 23869858]
  27. J Bacteriol. 2002 Mar;184(6):1522-9 [PMID: 11872702]
  28. J Biol Chem. 1982 Mar 25;257(6):3026-31 [PMID: 7037777]
  29. J Bacteriol. 2007 Sep;189(17):6415-24 [PMID: 17601784]
  30. Annu Rev Microbiol. 2002;56:289-314 [PMID: 12142488]
  31. J Bacteriol. 2006 Apr;188(8):3088-98 [PMID: 16585768]
  32. Genome Res. 2005 Jun;15(6):809-19 [PMID: 15930492]
  33. J Bacteriol. 2003 Jan;185(2):610-9 [PMID: 12511508]
  34. Mol Plant Microbe Interact. 2004 Dec;17(12):1366-75 [PMID: 15597742]
  35. Science. 1997 Oct 24;278(5338):631-7 [PMID: 9381173]
  36. Microbiol Mol Biol Rev. 2006 Mar;70(1):192-221 [PMID: 16524923]
  37. PLoS Pathog. 2013 Feb;9(2):e1003175 [PMID: 23459509]
  38. Mol Microbiol. 2005 May;56(3):763-73 [PMID: 15819630]
  39. Appl Environ Microbiol. 1986 Aug;52(2):305-10 [PMID: 3752996]
  40. BMC Genomics. 2006 Jul 04;7:171 [PMID: 16820057]
  41. Nahrung. 2000 Jun;44(3):173-7 [PMID: 10907238]
  42. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D41-5 [PMID: 16381899]
  43. J Biol Chem. 2004 Jul 16;279(29):30158-67 [PMID: 15140891]
  44. Nucleic Acids Res. 2001 Jan 1;29(1):22-8 [PMID: 11125040]
  45. BMC Bioinformatics. 2008 Aug 05;9:329 [PMID: 18680607]
  46. Genomics. 2003 Feb;81(2):98-104 [PMID: 12620386]
  47. Nucleic Acids Res. 2005 Feb 08;33(3):880-92 [PMID: 15701760]
  48. Int J Syst Evol Microbiol. 2003 Mar;53(Pt 2):381-391 [PMID: 12710602]
  49. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W32-7 [PMID: 16845018]
  50. J Appl Bacteriol. 1970 Dec;33(4):692-707 [PMID: 5534578]
  51. Proc Natl Acad Sci U S A. 2001 Dec 4;98(25):14613-8 [PMID: 11724939]
  52. PLoS One. 2010 May 14;5(5):e10654 [PMID: 20498846]
  53. AMIA Annu Symp Proc. 2008 Nov 06;:445-9 [PMID: 18999186]
  54. Microbiology (Reading). 2008 Aug;154(Pt 8):2387-2396 [PMID: 18667571]
  55. Proc Natl Acad Sci U S A. 2000 Jan 4;97(1):262-7 [PMID: 10618406]
  56. BMC Bioinformatics. 2006 Oct 26;7:476 [PMID: 17067382]
  57. BMC Genomics. 2005 Nov 17;6:162 [PMID: 16288665]
  58. Nucleic Acids Res. 2003 Jan 1;31(1):147-51 [PMID: 12519969]
  59. Mol Plant Microbe Interact. 2005 Feb;18(2):150-7 [PMID: 15720084]
  60. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W526-31 [PMID: 15980527]
  61. Annu Rev Microbiol. 1996;50:213-57 [PMID: 8905080]
  62. J Bacteriol. 2005 Jan;187(2):639-48 [PMID: 15629935]
  63. Nucleic Acids Res. 2004 Oct 12;32(18):5452-63 [PMID: 15479782]
  64. Mol Plant Microbe Interact. 2001 Mar;14(3):386-93 [PMID: 11277436]
  65. Mol Plant Microbe Interact. 2001 Jan;14(1):10-20 [PMID: 11194867]
  66. Proc Natl Acad Sci U S A. 1999 Mar 16;96(6):2896-901 [PMID: 10077608]
  67. J Bacteriol. 2001 Jun;183(11):3261-7 [PMID: 11344132]
  68. Methods Enzymol. 1994;235:527-40 [PMID: 8057924]
  69. Microbiology (Reading). 2004 Nov;150(Pt 11):3571-3590 [PMID: 15528647]
  70. Proc Natl Acad Sci U S A. 1999 Apr 13;96(8):4285-8 [PMID: 10200254]
  71. Trends Biochem Sci. 1998 Sep;23(9):324-8 [PMID: 9787636]
  72. BMC Bioinformatics. 2006 Jan 06;7:4 [PMID: 16398931]
  73. Nature. 2000 Jun 15;405(6788):823-6 [PMID: 10866208]
  74. Nucleic Acids Res. 2004 Nov 01;32(19):5874-93 [PMID: 15520470]
  75. Nucleic Acids Res. 2011 Jul;39(Web Server issue):W385-90 [PMID: 21609959]
  76. Bioinformatics. 2007 Feb 15;23(4):401-7 [PMID: 17182697]
  77. J Bacteriol. 2005 Nov;187(22):7840-4 [PMID: 16267307]
  78. Nat Biotechnol. 2004 Jul;22(7):911-7 [PMID: 15229555]
  79. Mol Microbiol. 2000 Nov;38(4):673-83 [PMID: 11115104]
  80. Science. 2006 Jun 2;312(5778):1355-9 [PMID: 16741115]
  81. Mol Microbiol. 1993 Jan;7(1):117-30 [PMID: 8437515]
  82. Mol Plant Microbe Interact. 2007 Mar;20(3):313-20 [PMID: 17378434]
  83. Appl Environ Microbiol. 2004 May;70(5):3013-23 [PMID: 15128563]

MeSH Term

Artificial Intelligence
Computational Biology
Enterobacteriaceae
Genes, Bacterial
Genomics
Host-Pathogen Interactions
Pectobacterium carotovorum
Plant Diseases
ROC Curve
Reproducibility of Results
Virulence
Virulence Factors

Chemicals

Virulence Factors

Word Cloud

Created with Highcharts 10.0.0geneshost-microbeinteractionfactorspathogensgenomeunknownfunctionlimitedmanylearningapproachlargephytopathogenschallengeresearchplant-microbeinteractionsidentifytwoknownorganismsislandstypegenomicstudysupervisedmachinedatasoftEnterobacteriaceaeDickeyadadantii3937PectobacteriumcarotovorumWPP14validationschemeBACKGROUND:wealthsequencesprovidedthousandsidentificationfunctionsnumbershypotheticalremainsimpactsDecadesmolecularbasispathogenesisfocusednumberassociatedlong-knownsystemsprovidingdirectionComputationalapproachesvirulenceoftenrelystrategies:searchingsequencesimilarityidentifyingdiscriminateonecloselyrelatednon-pathogensdifferentformerexcludingvastcollectionsfoundeverylatterlacksspecificitysincelittlehost-interactionRESULT:developeddesignedrecognizepatternsdisparatetypesordercandidaterotstrainsuseddevelopmenttoolimportantmultiplehighvaluecropsagricultureworldwidefunctionalavailablemicrobialfamilyachievedgreater90%precisionrecallrate80%10-foldcrosstestsCONCLUSION:Applicationcompletegeneratedlistroughly200candidatespreviouslyimplicatedcompletelylistsprovidenewtargetsexperimentalcharacterizationpresentspromisingpattern-learningcangeneralizedcreateresourcebacterialIdentificationgenomesrot-associated

Similar Articles

Cited By