BMT: A Cross-Validated ThinPrep Pap Cervical Cytology Dataset for Machine Learning Model Training and Validation.

E Celeste Welch, Chenhao Lu, C James Sung, Cunxian Zhang, Anubhav Tripathi, Joyce Ou
Author Information
  1. E Celeste Welch: Center for Biomedical Engineering, School of Engineering, Brown University, Providence, RI, 02912, USA.
  2. Chenhao Lu: Department of Computer Science, Brown University, Providence, RI, 02912, USA.
  3. C James Sung: Department of Pathology and Laboratory Medicine, Alpert Medical School, Brown University, Providence, RI, 02912, USA.
  4. Cunxian Zhang: Department of Pathology and Laboratory Medicine, Alpert Medical School, Brown University, Providence, RI, 02912, USA.
  5. Anubhav Tripathi: Center for Biomedical Engineering, School of Engineering, Brown University, Providence, RI, 02912, USA.
  6. Joyce Ou: Department of Pathology and Laboratory Medicine, Alpert Medical School, Brown University, Providence, RI, 02912, USA. joyce_ou@brown.edu.

Abstract

In the past several years, a few cervical Pap smear datasets have been published for use in clinical training. However, most publicly available datasets consist of pre-segmented single cell images, contain on-image annotations that must be manually edited out, or are prepared using the conventional Pap smear method. Multicellular liquid Pap image datasets are a more accurate reflection of current cervical screening techniques. While a multicellular liquid SurePath™ dataset has been created, machine learning models struggle to classify a test image set when it is prepared differently from the training set due to visual differences. Therefore, this dataset of multicellular Pap smear images prepared with the more common ThinPrep® protocol is presented as a helpful resource for training and testing artificial intelligence models, particularly for future application in cervical dysplasia diagnosis. The "Brown Multicellular ThinPrep" (BMT) dataset is the first publicly available multicellular ThinPrep® dataset, consisting of 600 clinically vetted images collected from 180 Pap smear slides from 180 patients, classified into three key diagnostic categories.

References

  1. Vet Clin Pathol. 1999;28(3):100-108 [PMID: 12075519]
  2. J Natl Med Assoc. 2020 Apr;112(2):229-232 [PMID: 32278478]
  3. Diagnostics (Basel). 2022 Jul 29;12(8): [PMID: 36010189]
  4. Ann Intern Med. 2011 Nov 15;155(10):687-97, W214-5 [PMID: 22006930]
  5. Nat Commun. 2021 Sep 24;12(1):5639 [PMID: 34561435]
  6. Diagn Cytopathol. 2000 Feb;22(2):86-91 [PMID: 10649517]
  7. Am J Clin Oncol. 2018 Mar;41(3):289-294 [PMID: 26808257]
  8. Nat Commun. 2021 Jun 10;12(1):3541 [PMID: 34112790]
  9. Am J Obstet Gynecol. 2001 Aug;185(2):308-17 [PMID: 11518884]
  10. Biomed Eng Online. 2019 Feb 12;18(1):16 [PMID: 30755214]
  11. Data Brief. 2020 Apr 22;30:105589 [PMID: 32368601]
  12. Sci Rep. 2021 Aug 9;11(1):16143 [PMID: 34373589]
  13. Sci Data. 2021 Jun 10;8(1):151 [PMID: 34112812]
  14. Sci Rep. 2021 Aug 10;11(1):16244 [PMID: 34376717]
  15. Cytojournal. 2022 Mar 29;19:24 [PMID: 35510105]
  16. Artif Intell Med. 2020 Jul;107:101897 [PMID: 32828445]
  17. Cytopathology. 2022 Nov;33(6):716-724 [PMID: 36004492]
  18. Artif Intell Med. 2008 Jan;42(1):1-11 [PMID: 17996432]
  19. Acta Cytol. 1997 Jan-Feb;41(1):30-8 [PMID: 9022723]
  20. Lancet Glob Health. 2023 Feb;11(2):e197-e206 [PMID: 36528031]

MeSH Term

Female
Humans
Cervix Uteri
Machine Learning
Papanicolaou Test
Uterine Cervical Neoplasms
Vaginal Smears

Word Cloud

Created with Highcharts 10.0.0PapsmeardatasetcervicaldatasetstrainingimagespreparedmulticellularpubliclyavailableMulticellularliquidimagemodelssetThinPrep®180pastseveralyearspublisheduseclinicalHoweverconsistpre-segmentedsinglecellcontainon-imageannotationsmustmanuallyeditedusingconventionalmethodaccuratereflectioncurrentscreeningtechniquesSurePath™createdmachinelearningstruggleclassifytestdifferentlyduevisualdifferencesThereforecommonprotocolpresentedhelpfulresourcetestingartificialintelligenceparticularlyfutureapplicationdysplasiadiagnosis"BrownThinPrep"BMTfirstconsisting600clinicallyvettedcollectedslidespatientsclassifiedthreekeydiagnosticcategoriesBMT:Cross-ValidatedThinPrepCervicalCytologyDatasetMachineLearningModelTrainingValidation

Similar Articles

Cited By

No available data.