A stacked ensemble machine learning approach for the prediction of diabetes.

Khondokar Oliullah, Mahedi Hasan Rasel, Md Manzurul Islam, Md Reazul Islam, Md Anwar Hussen Wadud, Md Whaiduzzaman
Author Information
  1. Khondokar Oliullah: Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh. ORCID
  2. Mahedi Hasan Rasel: Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh.
  3. Md Manzurul Islam: Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh.
  4. Md Reazul Islam: Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh. ORCID
  5. Md Anwar Hussen Wadud: Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh.
  6. Md Whaiduzzaman: School of Information Systems, Queensland University of Technology, Brisbane, Australia. ORCID

Abstract

Objectives: Diabetes has become a leading cause of mortality in both developed and developing countries, impacting a growing number of individuals worldwide. As the prevalence of the disease continues to rise, researchers have diligently worked towards developing accurate diabetes prediction models. The primary aim of this study is to utilize a diverse set of machine learning algorithms to detect the presence of diabetes, particularly in females, at an early stage. By leveraging these methods, this research seeks to provide physicians with valuable tools to identify the disease early, enabling timely interventions and improving patient outcomes.
Methods: In this study, some state-of-the-art machine learning techniques, such as random forest classifiers with gridsearchCV, XGBoost, NGBoost, Bagging, LightGBM, and AdaBoost classifiers, were employed. These models were chosen as the base layer of our proposed stacked ensemble model because of their high accuracy. Before feeding the data into the models, the dataset was preprocessed to ensure optimal performance and obtain improved results.
Results: The accuracy achieved in this study was 92.91%, which demonstrates its competitiveness with the existing approaches. Moreover, the utilization of the Shapley additive explanation (SHAP) facilitated the interpretation of machine learning models.
Conclusion: We anticipate that these findings will be beneficial to healthcare providers, stakeholders, students, and researchers involved in diabetes prediction research and development.

Keywords

References

  1. Health Inf Sci Syst. 2020 Jan 03;8(1):7 [PMID: 31949894]
  2. Front Genet. 2018 Nov 06;9:515 [PMID: 30459809]
  3. J Diabetes Metab Disord. 2023 May 13;:1-14 [PMID: 37363202]
  4. Nonlinear Dyn. 2021;106(2):1453-1475 [PMID: 34025034]
  5. Healthc Technol Lett. 2021 May 02;8(3):45-57 [PMID: 34035925]
  6. Healthcare (Basel). 2021 Oct 18;9(10): [PMID: 34683073]
  7. J Diabetes Metab Disord. 2022 Dec 22;22(1):255-265 [PMID: 37255802]
  8. J Diabetes Metab Disord. 2022 Jul 26;21(2):1433-1441 [PMID: 36404838]
  9. J Environ Manage. 2022 Jan 1;301:113941 [PMID: 34731954]
  10. Int J Endocrinol. 2015;2015:806257 [PMID: 26089894]
  11. J Diabetes Metab Disord. 2023 Apr 14;22(1):881-895 [PMID: 37255780]
  12. J Diabetes Metab Disord. 2020 Apr 14;19(1):391-403 [PMID: 32550190]
  13. J Diabetes Metab Disord. 2022 Oct 31;22(1):315-323 [PMID: 37255839]
  14. J Healthc Eng. 2022 Jan 11;2022:1684017 [PMID: 35070225]

Word Cloud

Created with Highcharts 10.0.0learningdiabetespredictionmodelsmachinestudyDiabetesdevelopingdiseaseresearchersearlyresearchclassifiersstackedensembleaccuracyObjectives:becomeleadingcausemortalitydevelopedcountriesimpactinggrowingnumberindividualsworldwideprevalencecontinuesrisediligentlyworkedtowardsaccurateprimaryaimutilizediversesetalgorithmsdetectpresenceparticularlyfemalesstageleveragingmethodsseeksprovidephysiciansvaluabletoolsidentifyenablingtimelyinterventionsimprovingpatientoutcomesMethods:state-of-the-arttechniquesrandomforestgridsearchCVXGBoostNGBoostBaggingLightGBMAdaBoostemployedchosenbaselayerproposedmodelhighfeedingdatadatasetpreprocessedensureoptimalperformanceobtainimprovedresultsResults:achieved9291%demonstratescompetitivenessexistingapproachesMoreoverutilizationShapleyadditiveexplanationSHAPfacilitatedinterpretationConclusion:anticipatefindingswillbeneficialhealthcareprovidersstakeholdersstudentsinvolveddevelopmentapproachMachinePIMAStackedEnsemble

Similar Articles

Cited By