Deciphering Insomnia: Benchmarking Automated Sleep Staging Algorithms for Complex Sleep Disorders.

Umaer Hanif, Anis Aloulou, Flynn Crosbie, Paul Bouchequet, Mounir Chennaoui, Thomas Andrillon, Damien Leger
Author Information
  1. Umaer Hanif: VIFASOM, (Vigilance Fatigue Sommeil et Sant�� Publique), Universit�� Paris Cit��, Paris, France. ORCID
  2. Anis Aloulou: VIFASOM, (Vigilance Fatigue Sommeil et Sant�� Publique), Universit�� Paris Cit��, Paris, France.
  3. Flynn Crosbie: VIFASOM, (Vigilance Fatigue Sommeil et Sant�� Publique), Universit�� Paris Cit��, Paris, France.
  4. Paul Bouchequet: VIFASOM, (Vigilance Fatigue Sommeil et Sant�� Publique), Universit�� Paris Cit��, Paris, France.
  5. Mounir Chennaoui: VIFASOM, (Vigilance Fatigue Sommeil et Sant�� Publique), Universit�� Paris Cit��, Paris, France. ORCID
  6. Thomas Andrillon: Sorbonne Universit��, Institut du Cerveau - Paris Brain Institute - ICM, Inserm, CNRS, APHP, H��pital de la Piti�� Salp��tri��re, Paris, France.
  7. Damien Leger: VIFASOM, (Vigilance Fatigue Sommeil et Sant�� Publique), Universit�� Paris Cit��, Paris, France.

Abstract

Polysomnography (PSG) is essential for diagnosing sleep disorders, but its manual interpretation is labor-intensive. Automated sleep staging algorithms are promising, yet their utility in complex sleep disorders such as insomnia remains uncertain. This study evaluates five of the most recognised sleep staging classifiers-U-Sleep, STAGES, GSSC, Luna and YASA-on PSG data from 904 patients with chronic insomnia. Performance was assessed using F1 scores, confusion matrices and predicted sleep metrics. The effect of demographics, sleepiness and PSG metrics on each classifier's performance was assessed using linear regression. Across all sleep stages, GSSC performed best (macro F1 score���=���0.66), followed by U-Sleep (0.62), Luna (0.56), STAGES (0.54) and YASA (0.52). GSSC achieved the highest F1 scores in Wake (0.83), N1 (0.22), N2 (0.80), N3 (0.71) and REM (0.76), while U-Sleep matched its performance in N1 and REM and Luna in N3. STAGES performed poorest in N3 (0.39) and YASA in REM (0.35). Common misclassifications included N1 vs. Wake/N2 and N3 vs. N2, with REM misclassified as Wake/N1/N2 by STAGES, Luna and YASA. GSSC and U-Sleep exhibited minimal demographic bias, while STAGES and Luna had more. No performance difference was observed between chronic insomnia patients with and without abnormal PSG. Sleep metric accuracy was highest for U-Sleep (TST, R���=���0.88), STAGES (SOL, R���=���0.82) and GSSC (WASO, R���=���0.82). These findings underscore the solid yet variable performance of the classifiers and highlight GSSC and U-Sleep as leading tools for sleep staging in patients with chronic insomnia.

Keywords

References

  1. Adra, N., H. Sun, W. Ganglberger, et al. 2022. ���Optimal Spindle Detection Parameters for Predicting Cognitive Performance.��� Sleep 45, no. 4: zsac001. https://doi.org/10.1093/sleep/zsac001.
  2. Andrillon, T., G. Solelhac, P. Bouchequet, et al. 2020. ���Revisiting the Value of Polysomnographic Data in Insomnia: More Than Meets the Eye.��� Sleep Medicine 66: 184���200.
  3. Arnal, P. J., V. Thorey, E. Debellemaniere, et al. 2020. ���The Dreem Headband Compared to Polysomnography for Electroencephalographic Signal Acquisition and Sleep Staging.��� Sleep 43, no. 11: zsaa097. https://doi.org/10.1093/sleep/zsaa097.
  4. Aserinsky, E., and N. Kleitman. 1953. ���Regularly Occurring Periods of Eye Motility, and Concomitant Phenomena, During Sleep.��� Science (1979) 118, no. 3062: 273���274.
  5. Baglioni, C., W. Regen, A. Teghen, et al. 2014. ���Sleep Changes in the Disorder of Insomnia: A Meta���Analysis of Polysomnographic Studies.��� Sleep Medicine Reviews 18, no. 3: 195���213.
  6. Bechny, M., G. Monachino, L. Fiorillo, et al. 2024. ���Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring Algorithm With Uncertainty���Guided Physician Review.��� Nature and Science of Sleep 16: 555���572.
  7. Benedetti, D., E. Frati, O. Kiss, et al. 2023. ���Performance Evaluation of the Open���Source Yet Another Spindle Algorithm Sleep Staging Algorithm Against Gold Standard Manual Evaluation of Polysomnographic Records in Adolescence.��� Sleep Health 9, no. 6: 910���924.
  8. Benz, F., D. Riemann, K. Domschke, et al. 2023. ���How Many Hours Do You Sleep? A Comparison of Subjective and Objective Sleep Duration Measures in a Sample of Insomnia Patients and Good Sleepers.��� Journal of Sleep Research 32, no. 2: e13802.
  9. Berry, R. B., R. Brooks, C. Gamaldo, et al. 2017. ���AASM Scoring Manual Updates for 2017 (Version 2.4).��� Journal of Clinical Sleep Medicine 13, no. 5: 665���666.
  10. Berry, R. B., R. Brooks, and C. E. Gamaldo. 2012. The AASM Manual for the Scoring of Sleep and Associated Events. Rules, Terminology and Technical Specifications, www.aasmnet.org.
  11. Biswal, S., J. Kulas, H. Sun, et al. 2017. SLEEPNET: Automated Sleep Staging System via Deep Learning. arXiv preprint arXiv:170708262. Published online.
  12. Blanken, T. F., J. S. Benjamins, D. Borsboom, et al. 2019. ���Insomnia Disorder Subtypes Derived From Life History and Traits of Affect and Personality.��� Lancet Psychiatry 6, no. 2: 151���163.
  13. Boulos, M. I., T. Jairam, T. Kendzerska, J. Im, A. Mekhael, and B. J. Murray. 2019. ���Normal Polysomnography Parameters in Healthy Adults: A Systematic Review and Meta���Analysis.��� Lancet Respiratory Medicine 7, no. 6: 533���543.
  14. Chambon, S., M. N. Galtier, P. J. Arnal, G. Wainrib, and A. Gramfort. 2018. ���A Deep Learning Architecture for Temporal Sleep Stage Classification Using Multivariate and Multimodal Time Series.��� IEEE Transactions on Neural Systems and Rehabilitation Engineering 26, no. 4: 758���769.
  15. Chung, J., C. Gulcehre, K. Cho, and Y. Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555. Published online.
  16. Danker���Hopfe, H., P. Anderer, J. Zeitlhofer, et al. 2009. ���Interrater Reliability for Sleep Scoring According to the Rechtschaffen & Kales and the New AASM Standard.��� Journal of Sleep Research 18, no. 1: 74���84.
  17. Decat, N., J. Walter, Z. H. Koh, et al. 2022. ���Beyond Traditional Sleep Scoring: Massive Feature Extraction and Data���Driven Clustering of Sleep Time Series.��� Sleep Medicine 98: 39���52.
  18. Dement, W., and N. Kleitman. 1957. ���The Relation of Eye Movements During Sleep to Dream Activity: An Objective Method for the Study of Dreaming.��� Journal of Experimental Psychology 53, no. 5: 339���346.
  19. Dikeos, D., A. Wichniak, P. Y. Ktonas, et al. 2023. ���The Potential of Biomarkers for Diagnosing Insomnia: Consensus Statement of the WFSBP Task Force on Sleep Disorders.��� World Journal of Biological Psychiatry 24, no. 8: 614���642.
  20. Djonlagic, I., S. Mariani, A. L. Fitzpatrick, et al. 2021. ���Macro and Micro Sleep Architecture and Cognitive Performance in Older Adults.��� Nature Human Behaviour 5, no. 1: 123���145.
  21. European Medicines Agency. 2011. Guideline on Medicinal Products for the Treatment of Insomnia. European Medicines Agency (EMA).
  22. Fiorillo, L., G. Monachino, J. van der Meer, et al. 2023. ���U���Sleep's Resilience to AASM Guidelines.��� npj Digital Medicine 6, no. 1: 33.
  23. Food and Drug Administration: US Department of Health E and W. 1997. Guidance for Industry: Guidelines for the Clinical Evaluation of Hypnotic Drugs. Center for Drug Evaluation and Research. Published online.
  24. Frase, L., C. Nissen, K. Spiegelhalder, and B. Feige. 2023. ���The Importance and Limitations of Polysomnography in Insomnia Disorder���A Critical Appraisal.��� Journal of Sleep Research 32, no. 6: e14036.
  25. Gaiduk, M., ��. Serrano Alarc��n, R. Seepold, and N. Mart��nez Madrid. 2023. ���Current Status and Prospects of Automatic Sleep Stages Scoring.��� Biomedical Engineering Letters 13, no. 3: 247���272.
  26. Gu, J., Z. Wang, J. Kuen, et al. 2018. ���Recent Advances in Convolutional Neural Networks.��� Pattern Recognition 77: 354���377.
  27. Guillot, A., F. Sauvet, E. H. During, and V. Thorey. 2020. ���Dreem Open Datasets: Multi���Scored Sleep Datasets to Compare Human and Automated Sleep Staging.��� IEEE Transactions on Neural Systems and Rehabilitation Engineering 28, no. 9: 1955���1965.
  28. Guillot, A., and V. Thorey. 2021. ���RobustSleepNet: Transfer Learning for Automated Sleep Staging at Scale.��� IEEE Transactions on Neural Systems and Rehabilitation Engineering 29: 1441���1451.
  29. Hanna, J., and A. Fl��el. 2023. ���An Accessible and Versatile Deep Learning���Based Sleep Stage Classifier.��� Frontiers in Neuroinformatics 17: 1086634.
  30. He, K., X. Zhang, S. Ren, and J. Sun. 2016. ���Deep Residual Learning for Image Recognition.��� In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770���778. IEEE.
  31. Luna algorithm, n.d. https://zzz.bwh.harvard.edu/luna/.
  32. Iber, C., S. Ancoli���Israel, A. Chesson, and S. Quan. 2007. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology, and Technical Specification. The American Academy of Sleep Medicine.
  33. 2014. International Classification of Sleep Disorders. 3rd ed. American Academy of Sleep Medicine.
  34. Jouvet, M. 1965. ���Paradoxical Sleep���A Study of Its Nature and Mechanisms.��� In Progress in Brain Research, vol. 18, 20���62. Elsevier.
  35. Kevat, A., R. Steinkey, S. Suresh, et al. 2024. ���Evaluation of Automated Pediatric Sleep Stage Classification Using U���Sleep: A Convolutional Neural Network.��� Journal of Clinical Sleep Medicine 21: jcsm���11362.
  36. Lee, Y. J., J. Y. Lee, J. H. Cho, and J. H. Choi. 2022. ���Interrater Reliability of Sleep Stage Scoring: A Meta���Analysis.��� Journal of Clinical Sleep Medicine 18, no. 1: 193���202.
  37. Natekin, A., and A. Knoll. 2013. ���Gradient Boosting Machines, a Tutorial.��� Frontiers in Neurorobotics 7: 21.
  38. Olesen, A. N., P. J. Jennum, E. Mignot, and H. B. D. Sorensen. 2021. ���Automatic Sleep Stage Classification With Deep Residual Networks in a Mixed���Cohort Setting.��� Sleep 44, no. 1: zsaa161.
  39. Palagini, L., and N. Rosenlicht. 2011. ���Sleep, Dreaming, and Mental Health: A Review of Historical and Neurobiological Perspectives.��� Sleep Medicine Reviews 15, no. 3: 179���186.
  40. Perslev, M., S. Darkner, L. Kempfner, M. Nikolic, P. J. Jennum, and C. Igel. 2021. ���U���Sleep: Resilient High���Frequency Sleep Staging.��� npj Digital Medicine 4, no. 1: 72. https://doi.org/10.1038/s41746���021���00440���5.
  41. Phan, H., F. Andreotti, N. Cooray, O. Y. Ch��n, and M. De Vos. 2019. ���SeqSleepNet: End���To���End Hierarchical Recurrent Neural Network for Sequence���To���Sequence Automatic Sleep Staging.��� IEEE Transactions on Neural Systems and Rehabilitation Engineering 27, no. 3: 400���410.
  42. Phan, H., K. Mikkelsen, O. Y. Ch��n, P. Koch, A. Mertins, and M. De Vos. 2022. ���Sleeptransformer: Automatic Sleep Staging With Interpretability and Uncertainty Quantification.��� IEEE Transactions on Biomedical Engineering 69, no. 8: 2456���2467.
  43. Rao, M. N., T. Blackwell, S. Redline, et al. 2009. ���Association Between Sleep Architecture and Measures of Body Composition.��� Sleep 32, no. 4: 483���490.
  44. Rechtschaffen, A., and A. Kales. 1968. A Manual of Standardized Terminology, Techniques, and Scoring System for Sleep Stages of Human Subjects. Public Health Service, US Government Printing Office.
  45. Riemann, D., C. A. Espie, E. Altena, et al. 2023. ���The European Insomnia Guideline: An Update on the Diagnosis and Treatment of Insomnia 2023.��� Journal of Sleep Research 32, no. 6: e14035.
  46. Ronneberger, O., P. Fischer, and T. Brox. 2015. ���U���Net: Convolutional Networks for Biomedical Image Segmentation.��� In Medical Image Computing and Computer���Assisted Intervention���MICCAI 2015: 18th International Conference, Munich, Germany, October 5���9, 2015, Proceedings, Part III 18, 234���241. Springer.
  47. Rosenberg, R. S., and S. van Hout. 2013. ���The American Academy of Sleep Medicine Inter���Scorer Reliability Program: Sleep Stage Scoring.��� Journal of Clinical Sleep Medicine 9, no. 1: 81���87.
  48. Stephan, A. M., and F. Siclari. 2023. ���Reconsidering Sleep Perception in Insomnia: From Misperception to Mismeasurement.��� Journal of Sleep Research 32, no. 6: e14028.
  49. Stephansen, J. B., A. N. Olesen, M. Olsen, et al. 2018. ���Neural Network Analysis of Sleep Stages Enables Efficient Diagnosis of Narcolepsy.��� Nature Communications 9, no. 1: 5229.
  50. Supratak, A., H. Dong, C. Wu, and Y. Guo. 2017. ���DeepSleepNet: A Model for Automatic Sleep Stage Scoring Based on Raw Single���Channel EEG.��� IEEE Transactions on Neural Systems and Rehabilitation Engineering 25, no. 11: 1998���2008.
  51. Taillard, J., C. Gronfier, S. Bioulac, P. Philip, and P. Sagaspe. 2021. ���Sleep in Normal Aging, Homeostatic and Circadian Regulation and Vulnerability to Sleep Deprivation.��� Brain Sciences 11, no. 8: 1003.
  52. Vallat, R., and M. P. Walker. 2021. ���An Open���Source, High���Performance Tool for Automated Sleep Staging.��� eLife 10: e70092.
  53. Yang, S., B. Jia, Y. Chen, Z. Huang, X. Huang, and J. Lv. 2020. ���U���Sleep: A Deep Neural Network for Automated Detection of Sleep Arousals Using Multiple PSGs.��� In Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 23���27, 2020, Proceedings, Part III 27, 629���640. Springer.
  54. Yetton, B. D., E. A. McDevitt, N. Cellini, C. Shelton, and S. C. Mednick. 2018. ���Quantifying Sleep Architecture Dynamics and Individual Differences Using Big Data and Bayesian Networks.��� PLoS One 13, no. 4: e0194604.
  55. Yu, Y., X. Si, C. Hu, and J. Zhang. 2019. ���A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures.��� Neural Computation 31, no. 7: 1235���1270.

Grants

  1. /Agence de l'innovation de d��fense

Word Cloud

Created with Highcharts 10.0.00sleepSTAGESGSSCinsomniaLunaU-SleepPSGstagingchronicperformanceN3REMpatientsF1YASAN1SleepR���=���0disordersAutomatedyetassessedusingscoresmetricsperformedhighestN2vs82Polysomnographyessentialdiagnosingmanualinterpretationlabor-intensivealgorithmspromisingutilitycomplexremainsuncertainstudyevaluatesfiverecognisedclassifiers-U-SleepYASA-ondata904Performanceconfusionmatricespredictedeffectdemographicssleepinessclassifier'slinearregressionAcrossstagesbestmacroscore���=���066followed62565452achievedWake8322807176matchedpoorest3935CommonmisclassificationsincludedWake/N2misclassifiedWake/N1/N2exhibitedminimaldemographicbiasdifferenceobservedwithoutabnormalmetricaccuracyTST88SOLWASOfindingsunderscoresolidvariableclassifiershighlightleadingtoolsDecipheringInsomnia:BenchmarkingStagingAlgorithmsComplexDisordersautomatedmachinelearningpolysomnography

Similar Articles

Cited By

No available data.