Predictions from algorithmic modeling result in better decisions than from data modeling for soybean iron deficiency chlorosis.

Zhanyou Xu, Andreomar Kurek, Steven B Cannon, William D Beavis
Author Information
  1. Zhanyou Xu: Plant Science Research Unit, USDA, Agricultural Research Service, Saint Paul, MN, United States of America. ORCID
  2. Andreomar Kurek: Department of Agronomy, Iowa State University, Ames, IA, United States of America.
  3. Steven B Cannon: Corn Insects, and Crop Genetics Research Unit, USDA, Agricultural Research Service, Ames, IA, United States of America.
  4. William D Beavis: Department of Agronomy, Iowa State University, Ames, IA, United States of America.

Abstract

In soybean variety development and genetic improvement projects, iron deficiency chlorosis (IDC) is visually assessed as an ordinal response variable. Linear Mixed Models for Genomic Prediction (GP) have been developed, compared, and used to select continuous plant traits such as yield, height, and maturity, but can be inappropriate for ordinal traits. Generalized Linear Mixed Models have been developed for GP of ordinal response variables. However, neither approach addresses the most important questions for cultivar development and genetic improvement: How frequently are the 'wrong' genotypes retained, and how often are the 'correct' genotypes discarded? The research objective reported herein was to compare outcomes from four data modeling and six algorithmic modeling GP methods applied to IDC using decision metrics appropriate for variety development and genetic improvement projects. Appropriate metrics for decision making consist of specificity, sensitivity, precision, decision accuracy, and area under the receiver operating characteristic curve. Data modeling methods for GP included ridge regression, logistic regression, penalized logistic regression, and Bayesian generalized linear regression. Algorithmic modeling methods include Random Forest, Gradient Boosting Machine, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes, and Artificial Neural Network. We found that a Support Vector Machine model provided the most specific decisions of correctly discarding IDC susceptible genotypes, while a Random Forest model resulted in the best decisions of retaining IDC tolerant genotypes, as well as the best outcomes when considering all decision metrics. Overall, the predictions from algorithmic modeling result in better decisions than from data modeling methods applied to soybean IDC.

References

  1. PLoS One. 2014 Jan 08;9(1):e85792 [PMID: 24416447]
  2. J Stat Softw. 2010;33(1):1-22 [PMID: 20808728]
  3. Genet Sel Evol. 2011 Feb 17;43:7 [PMID: 21329522]
  4. BMC Genomics. 2014 Aug 29;15:740 [PMID: 25174348]
  5. G3 (Bethesda). 2013 Mar;3(3):481-91 [PMID: 23450123]
  6. Nature. 1989 Jan 12;337(6203):129-32 [PMID: 2911347]
  7. J Appl Meas. 2013;14(1):79-90 [PMID: 23442329]
  8. JAMA. 2017 Mar 14;317(10):1068-1069 [PMID: 28291878]
  9. Ann Emerg Med. 1990 May;19(5):591-7 [PMID: 2331107]
  10. Bioinformatics. 2005 Oct 15;21(20):3940-1 [PMID: 16096348]
  11. Heredity (Edinb). 2015 Dec;115(6):547-55 [PMID: 26126540]
  12. Brief Funct Genomics. 2010 Mar;9(2):166-77 [PMID: 20156985]
  13. Int J Approx Reason. 2008 Jan;47(1):17-36 [PMID: 19079753]
  14. BMC Genomics. 2009 Aug 13;10:376 [PMID: 19678937]
  15. BMC Genomics. 2007 Dec 21;8:476 [PMID: 18154662]
  16. BMC Proc. 2012 May 21;6 Suppl 2:S10 [PMID: 22640436]
  17. Theor Appl Genet. 2009 Dec;120(1):151-61 [PMID: 19841887]
  18. Biometrics. 2006 Mar;62(1):221-9 [PMID: 16542249]
  19. Genet Epidemiol. 2010 Dec;34(8):879-91 [PMID: 21104890]
  20. Genetics. 2007 Dec;177(4):2389-97 [PMID: 18073436]
  21. Theor Appl Genet. 2008 Apr;116(6):777-87 [PMID: 18292984]
  22. J Dairy Sci. 2009 Feb;92(2):433-43 [PMID: 19164653]
  23. Genetics. 2001 Apr;157(4):1819-29 [PMID: 11290733]
  24. BMC Proc. 2011 May 27;5 Suppl 3:S11 [PMID: 21624167]
  25. J Anim Sci. 1980 Dec;51(6):1266-71 [PMID: 7204270]
  26. G3 (Bethesda). 2014 Dec 23;5(2):291-300 [PMID: 25538102]
  27. J Anim Breed Genet. 2007 Dec;124(6):331-41 [PMID: 18076470]
  28. Genetics. 2014 Oct;198(2):483-95 [PMID: 25009151]

MeSH Term

Glycine max
Algorithms
Bayes Theorem
Genotype
Iron Deficiencies
Plant Necrosis and Chlorosis

Word Cloud

Created with Highcharts 10.0.0modelingIDCGPgenotypesmethodsdecisionregressiondecisionssoybeandevelopmentgeneticordinaldataalgorithmicmetricsMachinevarietyimprovementprojectsirondeficiencychlorosisresponseLinearMixedModelsdevelopedtraitsoutcomesappliedlogisticRandomForestSupportVectormodelbestresultbettervisuallyassessedvariableGenomicPredictioncomparedusedselectcontinuousplantyieldheightmaturitycaninappropriateGeneralizedvariablesHoweverneitherapproachaddressesimportantquestionscultivarimprovement:frequently'wrong'retainedoften'correct'discarded?researchobjectivereportedhereincomparefoursixusingappropriateAppropriatemakingconsistspecificitysensitivityprecisionaccuracyareareceiveroperatingcharacteristiccurveDataincludedridgepenalizedBayesiangeneralizedlinearAlgorithmicincludeGradientBoostingK-NearestNeighborsNaïveBayesArtificialNeuralNetworkfoundprovidedspecificcorrectlydiscardingsusceptibleresultedretainingtolerantwellconsideringOverallpredictionsPredictions

Similar Articles

Cited By