Computation of the distribution of model accuracy statistics in machine learning: Comparison between analytically derived distributions and simulation-based methods.

Alexander A Huang, Samuel Y Huang
Author Information
  1. Alexander A Huang: Northwestern University Feinberg School of Medicine Northwestern University Chicago Illinois USA.
  2. Samuel Y Huang: Virginia Commonwealth School of Medicine Virginia Commonwealth University Richmond Virginia USA. ORCID

Abstract

Background and Aims: All fields have seen an increase in machine-learning techniques. To accurately evaluate the efficacy of novel modeling methods, it is necessary to conduct a critical evaluation of the utilized model metrics, such as sensitivity, specificity, and area under the receiver operator characteristic curve (AUROC). For commonly used model metrics, we proposed the use of analytically derived distributions (ADDs) and compared it with simulation-based approaches.
Methods: A retrospective cohort study was conducted using the England National Health Services Heart Disease Prediction Cohort. Four machine learning models (XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boost) were used. The distribution of the model metrics and covariate gain statistics were empirically derived using boot-strap simulation ( = 10,000). The ADDs were created from analytic formulas from the covariates to describe the distribution of the model metrics and compared with those of bootstrap simulation.
Results: XGBoost had the most optimal model having the highest AUROC and the highest aggregate score considering six other model metrics. Based on the Anderson-Darling test, the distribution of the model metrics created from bootstrap did not significantly deviate from a normal distribution. The variance created from the ADD led to smaller SDs than those derived from bootstrap simulation, whereas the rest of the distribution remained not statistically significantly different.
Conclusions: ADD allows for cross study comparison of model metrics, which is usually done with bootstrapping that rely on simulations, which cannot be replicated by the reader.

Keywords

References

  1. J Diabetes Res. 2022 Apr 22;2022:2590415 [PMID: 35493606]
  2. Appl Psychol Meas. 2023 Jan;47(1):3-18 [PMID: 36425289]
  3. JAMA Oncol. 2023 Mar 1;9(3):414-418 [PMID: 36633868]
  4. Res Q Exerc Sport. 1997 Mar;68(1):44-55 [PMID: 9094762]
  5. BMC Bioinformatics. 2022 Mar 22;23(1):101 [PMID: 35317727]
  6. Chaos. 2022 Jan;32(1):013111 [PMID: 35105123]
  7. Front Genet. 2020 Apr 03;11:247 [PMID: 32346383]
  8. Int J Approx Reason. 2008 Jan;47(1):17-36 [PMID: 19079753]
  9. PLoS Med. 2022 Jan 10;19(1):e1003884 [PMID: 35007282]
  10. Int J Legal Med. 2022 Sep;136(5):1227-1235 [PMID: 35396663]
  11. Psychometrika. 2023 Mar;88(1):208-240 [PMID: 35661291]
  12. BMJ Open. 2019 Jul 24;9(7):e025132 [PMID: 31345963]
  13. Stata J. 2014 Oct 1;14(4):863-883 [PMID: 25642154]
  14. Br J Nurs. 2022 Jun 23;31(12):671 [PMID: 35736858]
  15. OTO Open. 2022 Feb 25;6(1):2473974X221075232 [PMID: 35237738]
  16. J Environ Sci (China). 2022 Apr;114:194-203 [PMID: 35459484]
  17. Appl Psychol Meas. 2022 Sep;46(6):462-478 [PMID: 35991828]
  18. Sci Rep. 2022 Sep 29;12(1):16299 [PMID: 36175526]
  19. Biometrics. 1991 Jun;47(2):487-96 [PMID: 1912257]
  20. Front Oncol. 2022 Dec 16;12:973045 [PMID: 36591492]
  21. BMC Med Res Methodol. 2023 Jan 11;23(1):8 [PMID: 36631766]
  22. Stat Med. 2022 Aug 30;41(19):3758-3771 [PMID: 35607846]
  23. J Orofac Orthop. 2024 Jul;85(4):239-249 [PMID: 36018345]
  24. BMC Med Inform Decis Mak. 2020 Jan 6;20(1):4 [PMID: 31906931]
  25. Front Endocrinol (Lausanne). 2022 Jul 27;13:892563 [PMID: 35966068]
  26. Eur J Surg Oncol. 2023 Apr;49(4):853-861 [PMID: 36586786]
  27. Int J STD AIDS. 2022 Apr;33(5):467-471 [PMID: 35231202]
  28. J Clin Epidemiol. 2022 Dec;152:281-284 [PMID: 36223816]
  29. Psychometrika. 2021 Dec;86(4):1039-1057 [PMID: 34341914]
  30. Stat Biosci. 2023;15(1):141-162 [PMID: 36042931]
  31. Front Oncol. 2020 Oct 30;10:551420 [PMID: 33194609]
  32. Health Sci Rep. 2023 Apr 20;6(4):e1214 [PMID: 37091362]
  33. PLOS Digit Health. 2022;1(8): [PMID: 36590140]
  34. J Thorac Cardiovasc Surg. 2023 Feb;165(2):502-516.e9 [PMID: 36038386]
  35. JAMA. 2017 Dec 12;318(22):2250-2251 [PMID: 29234793]
  36. Epilepsy Res. 2022 Dec;188:107040 [PMID: 36332542]
  37. Stat Med. 1987 Jun;6(4):491-9 [PMID: 3629050]
  38. Transl Clin Pharmacol. 2022 Dec;30(4):172-181 [PMID: 36632078]
  39. Int J Radiat Oncol Biol Phys. 2021 Mar 15;109(4):1086-1095 [PMID: 33197530]
  40. Qual Manag Health Care. 2023 Jan-Mar 01;32(Suppl 1):S21-S28 [PMID: 36579705]

Grants

  1. T35 DK126628/NIDDK NIH HHS

Word Cloud

Created with Highcharts 10.0.0modeldistributionmetricsderivedsimulationbootstrapstatisticscreatedmethodsAUROCusedanalyticallydistributionsADDscomparedsimulation-basedstudyusingmachineXGBoosthighestsignificantlynormalvarianceADDBackgroundAims:fieldsseenincreasemachine-learningtechniquesaccuratelyevaluateefficacynovelmodelingnecessaryconductcriticalevaluationutilizedsensitivityspecificityareareceiveroperatorcharacteristiccurvecommonlyproposeduseapproachesMethods:retrospectivecohortconductedEnglandNationalHealthServicesHeartDiseasePredictionCohortFourlearningmodelsRandomForestArtificialNeuralNetworkAdaptiveBoostcovariategainempiricallyboot-strap = 10000analyticformulascovariatesdescribeResults:optimalaggregatescoreconsideringsixBasedAnderson-DarlingtestdeviateledsmallerSDswhereasrestremainedstatisticallydifferentConclusions:allowscrosscomparisonusuallydonebootstrappingrelysimulationsreplicatedreaderComputationaccuracylearning:ComparisonAnderson–DarlingGaussianWhitney–Mannsufficientcalculations

Similar Articles

Cited By