Classification and prediction of diabetes disease using machine learning paradigm.

Md Maniruzzaman, Md Jahanur Rahman, Benojir Ahammed, Md Menhazul Abedin
Author Information
  1. Md Maniruzzaman: 1Statistics Discipline, Khulna University, Khulna, 9208 Bangladesh.
  2. Md Jahanur Rahman: 2Department of Statistics, University of Rajshahi, Rajshahi, 6205 Bangladesh.
  3. Benojir Ahammed: 1Statistics Discipline, Khulna University, Khulna, 9208 Bangladesh.
  4. Md Menhazul Abedin: 1Statistics Discipline, Khulna University, Khulna, 9208 Bangladesh.

Abstract

BACKGROUND AND OBJECTIVES: Diabetes is a chronic disease characterized by high blood sugar. It may cause many complicated disease like stroke, kidney failure, heart attack, etc. About 422 million people were affected by diabetes disease in worldwide in 2014. The figure will be reached 642 million in 2040. The main objective of this study is to develop a machine learning (ML)-based system for predicting diabetic patients.
MATERIALS AND METHODS: Logistic regression (LR) is used to identify the risk factors for diabetes disease based on p value and odds ratio (OR). We have adopted four classifiers like naïve Bayes (NB), decision tree (DT), Adaboost (AB), and random forest (RF) to predict the diabetic patients. Three types of partition protocols (K2, K5, and K10) have also adopted and repeated these protocols into 20 trails. Performances of these classifiers are evaluated using accuracy (ACC) and area under the curve (AUC).
RESULTS: We have used diabetes dataset, conducted in 2009-2012, derived from the National Health and Nutrition Examination Survey. The dataset consists of 6561 respondents with 657 diabetic and 5904 controls. LR model demonstrates that 7 factors out of 14 as age, education, BMI, systolic BP, diastolic BP, direct cholesterol, and total cholesterol are the risk factors for diabetes. The overall ACC of ML-based system is . The combination of LR-based feature selection and RF-based classifier gives ACC and AUC for K10 protocol.
CONCLUSION: The combination of LR and RF-based classifier performs better. This combination will be very helpful for predicting diabetic patients.

Keywords

References

  1. J Biomed Inform. 2016 Feb;59:185-200 [PMID: 26703093]
  2. Front Genet. 2018 Nov 06;9:515 [PMID: 30459809]
  3. Health Inf Sci Syst. 2016 May 23;4:4 [PMID: 27217953]
  4. Brief Bioinform. 2019 Mar 22;20(2):492-503 [PMID: 29045534]
  5. J Diabetes Res. 2019 Jan 22;2019:4248218 [PMID: 30805372]
  6. Health Inf Sci Syst. 2016 Jun 08;4:5 [PMID: 27280018]
  7. Comput Methods Programs Biomed. 2017 Dec;152:23-34 [PMID: 29054258]
  8. Diabetes Care. 2010 Jan;33 Suppl 1:S62-9 [PMID: 20042775]
  9. Health Inf Sci Syst. 2018 Nov 28;7(1):1 [PMID: 30588291]
  10. Nat Rev Endocrinol. 2016 Oct;12(10):616-22 [PMID: 27388988]
  11. Health Inf Sci Syst. 2018 Sep 3;6(1):9 [PMID: 30186595]
  12. Health Inf Sci Syst. 2015 Sep 28;3:3 [PMID: 26417431]
  13. J Glob Health. 2018 Jun;8(1):010417 [PMID: 29740501]
  14. Health Inf Sci Syst. 2018 Sep 28;6(1):18 [PMID: 30279988]
  15. Lancet. 2011 Jul 2;378(9785):31-40 [PMID: 21705069]
  16. J Med Syst. 2018 Apr 10;42(5):92 [PMID: 29637403]
  17. Health Inf Sci Syst. 2018 Sep 24;6(1):16 [PMID: 30279986]
  18. Comput Methods Programs Biomed. 2016 Apr;126:98-109 [PMID: 26830378]
  19. BMC Med Inform Decis Mak. 2010 Mar 22;10:16 [PMID: 20307319]
  20. N Engl J Med. 1993 Jun 10;328(23):1676-85 [PMID: 8487827]
  21. Comput Biol Med. 2017 Dec 1;91:198-212 [PMID: 29100114]
  22. J Med Syst. 2018 Apr 13;42(5):97 [PMID: 29654417]
  23. Comput Methods Programs Biomed. 2017 Oct;150:9-22 [PMID: 28859832]
  24. Scientifica (Cairo). 2016;2016:8309253 [PMID: 27529053]
  25. Health Inf Sci Syst. 2018 Aug 20;6(1):7 [PMID: 30151186]
  26. J Clin Epidemiol. 2004 Nov;57(11):1138-46 [PMID: 15567629]
  27. J Med Syst. 2017 Aug 23;41(10):152 [PMID: 28836045]
  28. Lancet. 2016 Apr 2;387(10026):1377-1396 [PMID: 27115820]
  29. Health Inf Sci Syst. 2016 Mar 08;4:2 [PMID: 26958341]
  30. Comput Biol Med. 2018 Oct 1;101:184-198 [PMID: 30149250]
  31. IEEE Trans Syst Man Cybern B Cybern. 2008 Apr;38(2):577-83 [PMID: 18348941]
  32. Lancet. 2010 Jun 26;375(9733):2215-22 [PMID: 20609967]

Word Cloud

Created with Highcharts 10.0.0diseasediabetesdiabeticlearningpatientsLRfactorsACCcombinationANDDiabeteslikemillionwillmachinesystempredictingusedriskadoptedclassifiersBayestreeAdaboostforestprotocolsK10usingAUCdatasetBPcholesterolRF-basedclassifierClassificationBACKGROUNDOBJECTIVES:chroniccharacterizedhighbloodsugarmaycausemanycomplicatedstrokekidneyfailureheartattacketc422peopleaffectedworldwide2014figurereached6422040mainobjectivestudydevelopML-basedMATERIALSMETHODS:LogisticregressionidentifybasedpvalueoddsratioORfournaïveNBdecisionDTABrandomRFpredictThreetypespartitionK2K5alsorepeated20trailsPerformancesevaluatedaccuracyareacurveRESULTS:conducted2009-2012derivedNationalHealthNutritionExaminationSurveyconsists6561respondents6575904controlsmodeldemonstrates714ageeducationBMIsystolicdiastolicdirecttotaloverallML-basedLR-basedfeatureselectiongivesprotocolCONCLUSION:performsbetterhelpfulpredictionparadigmDecisionMachineNaïveRandom

Similar Articles

Cited By