Reading Profiles in Multi-Site Data With Missingness.

Mark A Eckert, Kenneth I Vaden, Mulugeta Gebregziabher, Dyslexia Data Consortium
Author Information
  1. Mark A Eckert: Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, SC, United States.
  2. Kenneth I Vaden: Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, SC, United States.
  3. Mulugeta Gebregziabher: Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, United States.

Abstract

Children with reading disability exhibit varied deficits in reading and cognitive abilities that contribute to their reading comprehension problems. Some children exhibit primary deficits in phonological processing, while others can exhibit deficits in oral language and executive functions that affect comprehension. This behavioral heterogeneity is problematic when missing data prevent the characterization of different reading profiles, which often occurs in retrospective data sharing initiatives without coordinated data collection. Here we show that reading profiles can be reliably identified based on Random Forest classification of incomplete behavioral datasets, after the missForest method is used to multiply impute missing values. Results from simulation analyses showed that reading profiles could be accurately classified across degrees of missingness (e.g., ∼5% classification error for 30% missingness across the sample). The application of missForest to a real multi-site dataset with missingness ( = 924) showed that reading disability profiles significantly and consistently differed in reading and cognitive abilities for cases with and without missing data. The results of validation analyses indicated that the reading profiles (cases with and without missing data) exhibited significant differences for an independent set of behavioral variables that were not used to classify reading profiles. Together, the results show how multiple imputation can be applied to the classification of cases with missing data and can increase the integrity of results from multi-site open access datasets.

Keywords

References

  1. Psychol Methods. 2002 Jun;7(2):147-77 [PMID: 12090408]
  2. J Educ Psychol. 2012 Feb;104(1): [PMID: 24273341]
  3. Bioinformatics. 2012 Jan 1;28(1):112-8 [PMID: 22039212]
  4. Dev Neuropsychol. 2008;33(6):663-81 [PMID: 19005910]
  5. Neuropsychologia. 2013 Feb;51(3):472-81 [PMID: 23178212]
  6. eNeuro. 2016 Jan 23;3(1): [PMID: 26835509]
  7. J Learn Disabil. 2011 Mar-Apr;44(2):167-83 [PMID: 21383108]
  8. J Int Neuropsychol Soc. 2009 Jul;15(4):501-8 [PMID: 19573267]
  9. Sci Stud Read. 2011;15(1):26-46 [PMID: 21132077]
  10. Psychol Bull. 2004 Nov;130(6):858-86 [PMID: 15535741]
  11. Brain Lang. 1994 Jul;47(1):96-116 [PMID: 7922479]
  12. Biol Psychol. 2009 Feb;80(2):226-39 [PMID: 19007845]
  13. J Learn Disabil. 2018 Sep/Oct;51(5):434-443 [PMID: 28693368]
  14. J Learn Disabil. 1988 Dec;21(10):590-604 [PMID: 2465364]
  15. J Learn Disabil. 2016 Sep;49(5):466-83 [PMID: 25398549]
  16. Ann Dyslexia. 2017 Oct;67(3):201-218 [PMID: 27848086]
  17. Ann Dyslexia. 2016 Oct;66(3):256-274 [PMID: 27324343]
  18. Sch Psychol Q. 2015 Sep;30(3):321-334 [PMID: 25243467]
  19. Brain. 2006 Dec;129(Pt 12):3329-42 [PMID: 17012292]
  20. Neuroimage. 2012 Apr 15;60(3):1843-55 [PMID: 22500925]
  21. J Learn Disabil. 2010 Sep-Oct;43(5):441-54 [PMID: 20375294]
  22. Dev Psychol. 2016 May;52(5):717-34 [PMID: 27110928]
  23. Ann Dyslexia. 2007 Jun;57(1):3-32 [PMID: 17849214]
  24. Ann Dyslexia. 2009 Jun;59(1):34-54 [PMID: 19396550]
  25. Sci Rep. 2017 Jul 20;7(1):6009 [PMID: 28729533]
  26. Brain Struct Funct. 2014 Sep;219(5):1697-707 [PMID: 23775490]

Grants

  1. C06 RR014516/NCRR NIH HHS
  2. R01 HD069374/NICHD NIH HHS

Word Cloud

Created with Highcharts 10.0.0readingdataprofilesmissingcanclassificationmissingnessexhibitdeficitsbehavioralwithoutcasesresultsdisabilitycognitiveabilitiescomprehensionshowdatasetsmissForestusedanalysesshowedacrossmulti-sitemultipleimputationChildrenvariedcontributeproblemschildrenprimaryphonologicalprocessingothersorallanguageexecutivefunctionsaffectheterogeneityproblematicpreventcharacterizationdifferentoftenoccursretrospectivesharinginitiativescoordinatedcollectionreliablyidentifiedbasedRandomForestincompletemethodmultiplyimputevaluesResultssimulationaccuratelyclassifieddegreeseg∼5%error30%sampleapplicationrealdataset=924significantlyconsistentlydifferedvalidationindicatedexhibitedsignificantdifferencesindependentsetvariablesclassifyTogetherappliedincreaseintegrityopenaccessReadingProfilesMulti-SiteDataMissingnessbigdyslexia

Similar Articles

Cited By