Prediction using hierarchical data: Applications for automated detection of cervical cancer.

Jose-Miguel Yamal, Martial Guillaud, E Neely Atkinson, Michele Follen, Calum MacAulay, Scott B Cantor, Dennis D Cox
Author Information
  1. Jose-Miguel Yamal: Department of Biostatistics, The University of Texas School of Public Health, 1200 Herman Pressler, Suite W-928, Houston, TX 77030, USA.
  2. Martial Guillaud: Department of Integrative Oncology, British Columbia Cancer Research Centre, 675 West 10th Ave, Vancouver, BC, V5Z 1L3, Canada.
  3. E Neely Atkinson: Department of Statistics, Rice University, 6100 Main St., Houston, TX 77005, USA.
  4. Michele Follen: Department of Obstetrics and Gynecology, Brookdale Hospital and Medical Center, 555 Rockaway Pkwy, Brooklyn, NY 11212, USA.
  5. Calum MacAulay: Department of Integrative Oncology, British Columbia Cancer Research Centre, 675 West 10th Ave, Vancouver, BC, V5Z 1L3, Canada.
  6. Scott B Cantor: Department of Health Services Research, The University of Texas MD Anderson Cancer Center, P.O. Box 301402, Unit 1444, Houston, TX 77230-1402, USA.
  7. Dennis D Cox: Department of Statistics, Rice University, 6100 Main St., Houston, TX 77005, USA.

Abstract

Although the Papanicolaou smear has been successful in decreasing cervical cancer incidence in the developed world, there exist many challenges for implementation in the developing world. Quantitative cytology, a semi-automated method that quantifies cellular image features, is a promising screening test candidate. The nested structure of its data (measurements of multiple cells within a patient) provides challenges to the usual classification problem. Here we perform a comparative study of three main approaches for problems with this general data structure: a) extract patient-level features from the cell-level data; b) use a statistical model that accounts for the hierarchical data structure; and c) classify at the cellular level and use an ad hoc approach to classify at the patient level. We apply these methods to a dataset of 1,728 patients, with an average of 2,600 cells collected per patient and 133 features measured per cell, predicting whether a patient had a positive biopsy result. The best approach we found was to classify at the cellular level and count the number of cells that had a posterior probability greater than a threshold value, with estimated 61% sensitivity and 89% specificity on independent data. Recent statistical learning developments allowed us to achieve high accuracy.

Keywords

References

  1. Cancer. 2013 Apr 1;119(7):1386-92 [PMID: 23508594]
  2. Gynecol Oncol. 2005 Dec;99(3 Suppl 1):S24-31 [PMID: 16185757]
  3. J Am Dent Assoc. 2002 Mar;133(3):357-62 [PMID: 11934191]
  4. J Am Dent Assoc. 1999 Oct;130(10):1445-57 [PMID: 10570588]
  5. Int J Cancer. 2009 Apr 1;124(7):1626-36 [PMID: 19115209]
  6. Arch Pathol Lab Med. 1999 Nov;123(11):1079-84 [PMID: 10539913]
  7. Cytometry B Clin Cytom. 2007 Sep;72(5):324-31 [PMID: 17205571]
  8. Cell Oncol. 2004;26(3):101-17 [PMID: 15371646]
  9. J Pathol. 2001 Jun;194(2):171-6 [PMID: 11400145]
  10. Int J Cancer. 2011 Mar 1;128(5):1151-68 [PMID: 20830707]
  11. Gynecol Oncol. 2005 Dec;99(3 Suppl 1):S38-52 [PMID: 16183106]
  12. Br J Obstet Gynaecol. 1998 Feb;105(2):206-10 [PMID: 9501788]
  13. Biometrics. 2003 Sep;59(3):614-23 [PMID: 14601762]
  14. Histopathology. 2004 Jun;44(6):603-14 [PMID: 15186276]
  15. Lancet. 2003 Dec 6;362(9399):1871-6 [PMID: 14667741]
  16. J Periodontol. 2007 Jan;78(1):79-86 [PMID: 17199543]
  17. Biostatistics. 2011 Oct;12(4):695-709 [PMID: 21642388]
  18. Int J Cancer. 2000 Nov 20;89(6):529-34 [PMID: 11102899]
  19. J Biomed Opt. 2012 Apr;17(4):047002 [PMID: 22559693]
  20. Cancer. 2004 Feb 25;102(1):41-54 [PMID: 14968417]
  21. Cancer. 2006 Jul 15;107(2):309-18 [PMID: 16773634]
  22. Int J Cancer. 2013 Feb 15;132(4):916-23 [PMID: 22684726]
  23. Cell Oncol. 2005;27(1):33-41 [PMID: 15750205]
  24. Cytopathology. 1997 Oct;8(5):298-312 [PMID: 9313982]
  25. J R Stat Soc Series B Stat Methodol. 2011 Nov;73(5):753-772 [PMID: 22323898]
  26. Pathologica. 1995 Jun;87(3):286-99 [PMID: 8570289]

Grants

  1. P01 CA082710/NCI NIH HHS

Word Cloud

Created with Highcharts 10.0.0datapatientcellularfeaturescellsclassifylevelcervicalcancerworldchallengescytologystructureclassificationusestatisticalhierarchicalapproachperAlthoughPapanicolaousmearsuccessfuldecreasingincidencedevelopedexistmanyimplementationdevelopingQuantitativesemi-automatedmethodquantifiesimagepromisingscreeningtestcandidatenestedmeasurementsmultiplewithinprovidesusualproblemperformcomparativestudythreemainapproachesproblemsgeneralstructure:extractpatient-levelcell-levelbmodelaccountscadhocapplymethodsdataset1728patientsaverage2600collected133measuredcellpredictingwhetherpositivebiopsyresultbestfoundcountnumberposteriorprobabilitygreaterthresholdvalueestimated61%sensitivity89%specificityindependentRecentlearningdevelopmentsallowedusachievehighaccuracyPredictionusingdata:ApplicationsautomateddetectionDNAploidyL1-regularizedlogisticregressioncross-validationmultilevelquantitativevariableselection

Similar Articles

Cited By