Methods for retrospectively improving race/ethnicity data quality: a scoping review.

Matthew K Chin, Lan N Đoàn, Rienna G Russo, Timothy Roberts, Sonia Persaud, Emily Huang, Lauren Fu, Kiran Y Kui, Simona C Kwon, Stella S Yi
Author Information
  1. Matthew K Chin: Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States.
  2. Lan N Đoàn: Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States.
  3. Rienna G Russo: Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States.
  4. Timothy Roberts: NYU Langone Health Sciences Library, NYU Grossman School of Medicine New York, NY 10016, United States.
  5. Sonia Persaud: Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States.
  6. Emily Huang: Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States.
  7. Lauren Fu: Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States.
  8. Kiran Y Kui: Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States.
  9. Simona C Kwon: Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States.
  10. Stella S Yi: Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States.

Abstract

Improving race and ethnicity (hereafter, race/ethnicity) data quality is imperative to ensure underserved populations are represented in data sets used to identify health disparities and inform health care policy. We performed a scoping review of methods that retrospectively improve race/ethnicity classification in secondary data sets. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, searches were conducted in the MEDLINE, Embase, and Web of Science Core Collection databases in July 2022. A total of 2 441 abstracts were dually screened, 453 full-text articles were reviewed, and 120 articles were included. Study characteristics were extracted and described in a narrative analysis. Six main method types for improving race/ethnicity data were identified: expert review (n = 9; 8%), name lists (n = 27, 23%), name algorithms (n = 55, 46%), machine learning (n = 14, 12%), data linkage (n = 9, 8%), and other (n = 6, 5%). The main racial/ethnic groups targeted for classification were Asian (n = 56, 47%) and White (n = 51, 43%). Some form of validation evaluation was included in 86 articles (72%). We discuss the strengths and limitations of different method types and potential harms of identified methods. Innovative methods are needed to better identify racial/ethnic subgroups and further validation studies. Accurately collecting and reporting disaggregated data by race/ethnicity are critical to address the systematic missingness of relevant demographic data that can erroneously guide policymaking and hinder the effectiveness of health care practices and intervention.

Keywords

Grants

  1. NU38OT2020001477/Centers for Disease Control and Prevention (CDC) and New York State (NYS)
  2. R01HL141427/NIH National Heart, Lung and Blood Institute
  3. U54MD000538/National Institutes of Health (NIH) National Institute on Minority Health and Health Disparities

MeSH Term

Humans
Ethnicity
Medically Underserved Area
Racial Groups
Retrospective Studies
Data Accuracy

Word Cloud

Created with Highcharts 10.0.0datarace/ethnicityhealthreviewmethodsclassificationarticlesethnicitysetsidentifycarescopingretrospectivelyincludedanalysismainmethodtypesimprovingn = 98%namealgorithmsmachinelearningracial/ethnicgroupsvalidationImprovingracehereafterqualityimperativeensureunderservedpopulationsrepresenteduseddisparitiesinformpolicyperformedimprovesecondaryFollowingPreferredReportingItemsSystematicReviewsMeta-AnalysesguidelinessearchesconductedMEDLINEEmbaseWebScienceCoreCollectiondatabasesJuly2022total2 441abstractsduallyscreened453full-textreviewed120StudycharacteristicsextracteddescribednarrativeSixidentified:expertlistsn = 2723%n = 5546%n = 1412%linkagen = 65%targetedAsiann = 5647%Whiten = 5143%formevaluation8672%discussstrengthslimitationsdifferentpotentialharmsidentifiedInnovativeneededbettersubgroupsstudiesAccuratelycollectingreportingdisaggregatedcriticaladdresssystematicmissingnessrelevantdemographiccanerroneouslyguidepolicymakinghindereffectivenesspracticesinterventionMethodsquality:equityracialsystemicracism

Similar Articles

Cited By