An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance.

Borislava Toleva, Ivan Atanasov, Ivan Ivanov, Vincent Hooper
Author Information
  1. Borislava Toleva: Faculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, Bulgaria. ORCID
  2. Ivan Atanasov: Faculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, Bulgaria.
  3. Ivan Ivanov: Faculty of Economics and Business Administration, Sofia University, St. Kl. Ohridski, 1113 Sofia, Bulgaria. ORCID
  4. Vincent Hooper: SP Jain Global School of Management, Academic City, Dubai P.O. Box 502345, United Arab Emirates. ORCID

Abstract

Diabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes a novel methodology to perform classification in the case of heavy class imbalance, as observed in the PIMA diabetes dataset. The proposed methodology uses two novel steps, namely resampling and random shuffling prior to defining the classification model. The methodology is tested with two versions of cross validation that are appropriate in cases of class imbalance-k-fold cross validation and stratified k-fold cross validation. Our findings suggest that when having imbalanced data, shuffling the data randomly prior to a train/test split can help improve estimation metrics. Our methodology can outperform existing machine learning algorithms and complex deep learning models. Applying our proposed methodology is a simple and fast way to predict labels with class imbalance. It does not require additional techniques to balance classes. It does not involve preselecting important variables, which saves time and makes the model easy for analysis. This makes it an effective methodology for initial and further modeling of data with class imbalance. Moreover, our methodologies show how to increase the effectiveness of the machine learning models based on the standard approaches and make them more reliable.

Keywords

References

  1. J Pers Med. 2024 Apr 23;14(5): [PMID: 38793025]
  2. J Diabetes Metab Disord. 2020 Apr 14;19(1):391-403 [PMID: 32550190]
  3. Front Genet. 2018 Nov 06;9:515 [PMID: 30459809]
  4. Bioengineering (Basel). 2024 Apr 15;11(4): [PMID: 38671800]
  5. Healthcare (Basel). 2024 Jan 05;12(2): [PMID: 38255014]
  6. PLoS One. 2017 Jul 24;12(7):e0179805 [PMID: 28738059]
  7. Neural Comput Appl. 2022 Mar 24;:1-17 [PMID: 35345556]
  8. Sci Rep. 2024 Jun 23;14(1):14429 [PMID: 38910179]
  9. Sensors (Basel). 2023 Feb 20;23(4): [PMID: 36850931]
  10. BMC Bioinformatics. 2023 Jun 1;24(1):224 [PMID: 37264332]
  11. Bioengineering (Basel). 2024 Oct 22;11(11): [PMID: 39593713]
  12. Bioengineering (Basel). 2024 Jun 27;11(7): [PMID: 39061736]
  13. Bioengineering (Basel). 2024 Nov 30;11(12): [PMID: 39768033]
  14. Cancers (Basel). 2024 Oct 08;16(19): [PMID: 39410036]
  15. Bioengineering (Basel). 2024 Dec 24;12(1): [PMID: 39851278]
  16. Diagnostics (Basel). 2023 Aug 15;13(16): [PMID: 37627940]
  17. J Biomed Inform. 2017 May;69:218-229 [PMID: 28410981]
  18. Bioengineering (Basel). 2024 Dec 08;11(12): [PMID: 39768059]

Word Cloud

Created with Highcharts 10.0.0methodologyclassdataimbalancecrossvalidationDiabetesclassificationlearningincreasenovelproposedtwoshufflingpriormodelcanmachinemodelsmakescauseslevelbloodsugarleadsdamagevariouspartshumanbodyusedprovidingdeeperunderstandingtreatmentmechanismsalsopredictingprobabilityonemightbecomesickpaperproposesperformcaseheavyobservedPIMAdiabetesdatasetusesstepsnamelyresamplingrandomdefiningtestedversionsappropriatecasesimbalance-k-foldstratifiedk-foldfindingssuggestimbalancedrandomlytrain/testsplithelpimproveestimationmetricsoutperformexistingalgorithmscomplexdeepApplyingsimplefastwaypredictlabelsrequireadditionaltechniquesbalanceclassesinvolvepreselectingimportantvariablessavestimeeasyanalysiseffectiveinitialmodelingMoreovermethodologiesshoweffectivenessbasedstandardapproachesmakereliableEffectiveMethodologyPredictionCaseClassImbalanceresampleshuffle

Similar Articles

Cited By

No available data.