Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence.

Hela Elmannai, Nora El-Rashidy, Ibrahim Mashal, Manal Abdullah Alohali, Sara Farag, Shaker El-Sappagh, Hager Saleh
Author Information
  1. Hela Elmannai: Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia. ORCID
  2. Nora El-Rashidy: Machine Learning and Information Retrieval Department, Faculty of Artificial Intelligence, Kafrelsheiksh University, Kafrelsheiksh 13518, Egypt. ORCID
  3. Ibrahim Mashal: Faculty of Information Technology, Applied Science Private University, Amman 11937, Jordan. ORCID
  4. Manal Abdullah Alohali: Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
  5. Sara Farag: Faculty of Computers and Informations, South Valley University, Qena 83523, Egypt. ORCID
  6. Shaker El-Sappagh: Faculty of Computer Science and Engineering, Galala University, Suez 435611, Egypt.
  7. Hager Saleh: Faculty of Computers and Artificial Intelligence, South Valley University, Hurghada 84511, Egypt. ORCID

Abstract

Polycystic ovary syndrome (PCOS) has been classified as a severe health problem common among women globally. Early detection and treatment of PCOS reduce the possibility of long-term complications, such as increasing the chances of developing type 2 diabetes and gestational diabetes. Therefore, effective and early PCOS diagnosis will help the healthcare systems to reduce the disease's problems and complications. Machine learning (ML) and ensemble learning have recently shown promising results in medical diagnostics. The main goal of our research is to provide model explanations to ensure efficiency, effectiveness, and trust in the developed model through local and global explanations. Feature selection methods with different types of ML models (logistic regression (LR), random forest (RF), decision tree (DT), naive Bayes (NB), support vector machine (SVM), k-nearest neighbor (KNN), xgboost, and Adaboost algorithm to get optimal feature selection and best model. Stacking ML models that combine the best base ML models with meta-learner are proposed to improve performance. Bayesian optimization is used to optimize ML models. Combining SMOTE (Synthetic Minority Oversampling Techniques) and ENN (Edited Nearest Neighbour) solves the class imbalance. The experimental results were made using a benchmark PCOS dataset with two ratios splitting 70:30 and 80:20. The result showed that the Stacking ML with REF feature selection recorded the highest accuracy at 100 compared to other models.

Keywords

References

  1. Nat Biomed Eng. 2019 Mar;3(3):173-182 [PMID: 30948806]
  2. Nat Mach Intell. 2019 May;1(5):206-215 [PMID: 35603010]
  3. N Engl J Med. 2016 Jul 7;375(1):54-64 [PMID: 27406348]
  4. Obstet Gynecol. 2018 Aug;132(2):321-336 [PMID: 29995717]
  5. Molecules. 2017 Dec 26;23(1): [PMID: 29278382]
  6. Int J Methods Psychiatr Res. 2011 Mar;20(1):e6-18 [PMID: 21574205]
  7. Diagnostics (Basel). 2022 Dec 18;12(12): [PMID: 36553222]
  8. Sensors (Basel). 2022 May 12;22(10): [PMID: 35632116]
  9. Clin Endocrinol (Oxf). 2021 Oct;95(4):531-541 [PMID: 33460482]
  10. Circulation. 2008 May 6;117(18):2395-9 [PMID: 18458181]
  11. Sci Rep. 2021 Jan 29;11(1):2660 [PMID: 33514817]
  12. Lancet. 2007 Aug 25;370(9588):685-97 [PMID: 17720020]
  13. J Insur Med. 2017;47(1):31-39 [PMID: 28836909]
  14. Nat Rev Endocrinol. 2018 May;14(5):270-284 [PMID: 29569621]

Word Cloud

Created with Highcharts 10.0.0MLlearningmodelsPCOSmodelselectionmachinePolycysticovarysyndromereducecomplicationsdiabetesMachineensembleresultsexplanationsFeaturefeaturebestStackingclassifiedseverehealthproblemcommonamongwomengloballyEarlydetectiontreatmentpossibilitylong-termincreasingchancesdevelopingtype2gestationalThereforeeffectiveearlydiagnosiswillhelphealthcaresystemsdisease'sproblemsrecentlyshownpromisingmedicaldiagnosticsmaingoalresearchprovideensureefficiencyeffectivenesstrustdevelopedlocalglobalmethodsdifferenttypeslogisticregressionLRrandomforestRFdecisiontreeDTnaiveBayesNBsupportvectorSVMk-nearestneighborKNNxgboostAdaboostalgorithmgetoptimalcombinebasemeta-learnerproposedimproveperformanceBayesianoptimizationusedoptimizeCombiningSMOTESyntheticMinorityOversamplingTechniquesENNEditedNearestNeighboursolvesclassimbalanceexperimentalmadeusingbenchmarkdatasettworatiossplitting70:3080:20resultshowedREFrecordedhighestaccuracy100comparedOvarySyndromeDetectionLearningModelBasedOptimizedSelectionExplainableArtificialIntelligenceexplainablepolycystic

Similar Articles

Cited By