Diabetes classification model based on boosting algorithms.

Peihua Chen, Chuandi Pan
Author Information
  1. Peihua Chen: Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical University, Wenzhou, China. chenphwmu666@163.com. ORCID
  2. Chuandi Pan: Department of Computer Technology and Information Management, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou City, China.

Abstract

BACKGROUND: Diabetes mellitus is a common and complicated chronic lifelong disease. Hence, it is of high clinical significance to find the most relevant clinical indexes and to perform efficient computer-aided pre-diagnoses and diagnoses.
RESULTS: Non-parametric statistical testing is performed on hundreds of medical measurement index results between diabetic and non-diabetic populations. Two common boosting algorithms, Adaboost.M1 and LogitBoost, are selected to establish a machine model for diabetes diagnosis based on these clinical test data, involving a total of 35,669 individuals. The machine classification models built by these two algorithms have very good classification ability. Here, the LogitBoost classification model is slightly better than the Adaboost.M1 classification model. The overall accuracy of the LogitBoost classification model reached 95.30% when using 10-fold cross validation. The true positive, true negative, false positive, and false negative rates of the binary classification model were 0.921, 0.969, 0.031, and 0.079, respectively, and the area under the receiver operating characteristic curve reached 0.99.
CONCLUSIONS: The boosting algorithms show excellent performance for the diabetes classification models based on clinical medical data. The coefficient matrix of the original data is a sparse matrix, because some of the test results were missing, including some that were directly related to disease diagnosis. Therefore, the model is robust and has a degree of pre-diagnosis function. In the process of selecting the preferred test items, the most statistically significant discriminating factors between the diabetic and general populations were obtained and can be used as reference risk factors for diabetes mellitus.

Keywords

References

  1. Diabetes Metab Syndr. 2014 Oct-Dec;8(4):216-20 [PMID: 25450820]
  2. Am J Epidemiol. 2010 May 1;171(9):980-8 [PMID: 20375194]
  3. Curr Cardiol Rep. 2014 Jan;16(1):441 [PMID: 24338557]
  4. Clin Epidemiol. 2012;4:213-24 [PMID: 22936857]
  5. BioData Min. 2016 Nov 18;9:36 [PMID: 27891179]
  6. Comput Biol Med. 2016 Jun 1;73:71-93 [PMID: 27089305]
  7. Int J Endocrinol. 2015;2015:204893 [PMID: 26697065]
  8. Lancet Diabetes Endocrinol. 2016 Jun;4(6):479-80 [PMID: 27156052]
  9. Arch Iran Med. 2015 May;18(5):277-83 [PMID: 25959909]
  10. Springerplus. 2014 Jul 14;3:355 [PMID: 25133086]
  11. Comput Methods Programs Biomed. 2016 Apr;127:44-51 [PMID: 27000288]
  12. BMC Res Notes. 2013 Nov 25;6:485 [PMID: 24274772]
  13. Ann Endocrinol (Paris). 2016 Oct;77(5):606-614 [PMID: 26903037]
  14. Zhonghua Wei Chang Wai Ke Za Zhi. 2015 Jan;18(1):1-5 [PMID: 25656020]
  15. Nanomedicine (Lond). 2016 Apr;11(8):959-82 [PMID: 26979668]
  16. Med Hist. 2016 Apr;60(2):294-6 [PMID: 26971613]
  17. Annu Int Conf IEEE Eng Med Biol Soc. 2012;2012:1061-4 [PMID: 23366078]
  18. Indian J Hematol Blood Transfus. 2012 Jun;28(2):105-8 [PMID: 23730017]
  19. Sci Rep. 2015 Aug 13;5:13058 [PMID: 26269425]
  20. Nephrourol Mon. 2014 Jul 05;6(4):e19976 [PMID: 25695026]
  21. Stud Health Technol Inform. 2013;193:332-61 [PMID: 24018527]
  22. BioData Min. 2017 Jun 27;10:21 [PMID: 28674556]
  23. Diabetes Res Clin Pract. 2013 Oct;102(1):65-75 [PMID: 23932206]
  24. Stud Health Technol Inform. 2014;197:65-9 [PMID: 24743079]
  25. Cardiovasc Diabetol. 2014 Oct 11;13:135 [PMID: 25301574]

MeSH Term

Algorithms
Diabetes Mellitus
Humans
Models, Theoretical
ROC Curve

Word Cloud

Created with Highcharts 10.0.0classificationmodelalgorithms0clinicalDiabetesboostingLogitBoostdiabetesbasedtestdatamellituscommondiseasediagnosesmedicalresultsdiabeticpopulationsAdaboostM1machinediagnosismodelsreachedtruepositivenegativefalsematrixfactorsBACKGROUND:complicatedchroniclifelongHencehighsignificancefindrelevantindexesperformefficientcomputer-aidedpre-diagnosesRESULTS:Non-parametricstatisticaltestingperformedhundredsmeasurementindexnon-diabeticTwoselectedestablishinvolvingtotal35669individualsbuilttwogoodabilityslightlybetteroverallaccuracy9530%using10-foldcrossvalidationratesbinary921969031079respectivelyareareceiveroperatingcharacteristiccurve99CONCLUSIONS:showexcellentperformancecoefficientoriginalsparsemissingincludingdirectlyrelatedThereforerobustdegreepre-diagnosisfunctionprocessselectingpreferreditemsstatisticallysignificantdiscriminatinggeneralobtainedcanusedreferenceriskBoostingComputer-aided

Similar Articles

Cited By