Estimating uncertainty in respondent-driven sampling using a tree bootstrap method.

Aaron J Baraff, Tyler H McCormick, Adrian E Raftery
Author Information
  1. Aaron J Baraff: Department of Statistics, University of Washington, Seattle, WA 98195-4322.
  2. Tyler H McCormick: Department of Statistics, University of Washington, Seattle, WA 98195-4322.
  3. Adrian E Raftery: Department of Statistics, University of Washington, Seattle, WA 98195-4322; raftery@uw.edu.

Abstract

Respondent-driven sampling (RDS) is a network-based form of chain-referral sampling used to estimate attributes of populations that are difficult to access using standard survey tools. Although it has grown quickly in popularity since its introduction, the statistical properties of RDS estimates remain elusive. In particular, the sampling variability of these estimates has been shown to be much higher than previously acknowledged, and even methods designed to account for RDS result in misleadingly narrow confidence intervals. In this paper, we introduce a tree bootstrap method for estimating uncertainty in RDS estimates based on resampling recruitment trees. We use simulations from known social networks to show that the tree bootstrap method not only outperforms existing methods but also captures the high variability of RDS, even in extreme cases with high design effects. We also apply the method to data from injecting drug users in Ukraine. Unlike other methods, the tree bootstrap depends only on the structure of the sampled recruitment trees, not on the attributes being measured on the respondents, so correlations between attributes can be estimated as well as variability. Our results suggest that it is possible to accurately assess the high level of uncertainty inherent in RDS.

Keywords

References

  1. J Clin Epidemiol. 2015 Dec;68(12):1463-71 [PMID: 26112433]
  2. J Urban Health. 2006 Nov;83(6 Suppl):i98-112 [PMID: 16937083]
  3. Am J Public Health. 2006 Jun;96(6):1091-7 [PMID: 16670236]
  4. Soc Sci Med. 1994 Jan;38(1):79-88 [PMID: 8146718]
  5. J R Stat Soc Ser A Stat Soc. 2015 Jun;178(3):619-639 [PMID: 26640328]
  6. AIDS. 1994 Sep;8(9):1331-6 [PMID: 7802989]
  7. Proc Natl Acad Sci U S A. 2010 Apr 13;107(15):6743-7 [PMID: 20351258]
  8. Sociol Methods Res. 2013 Aug;42(3):null [PMID: 24288418]
  9. NIDA Res Monogr. 1995;151:3-19 [PMID: 8742758]
  10. J R Stat Soc Ser A Stat Soc. 2015 Jan;178(1):241-269 [PMID: 27226702]

Grants

  1. R01 HD054511/NICHD NIH HHS
  2. P01 HD031921/NICHD NIH HHS
  3. U54 HL127624/NHLBI NIH HHS
  4. T32 GM081062/NIGMS NIH HHS
  5. R01 HD070936/NICHD NIH HHS
  6. K01 HD078452/NICHD NIH HHS

MeSH Term

Adolescent
Adolescent Behavior
Algorithms
Centers for Disease Control and Prevention, U.S.
Colorado
Computer Simulation
Female
HIV Infections
Heterosexuality
Humans
Longitudinal Studies
Male
Models, Statistical
Patient Selection
Probability
Risk-Taking
Schools
Sex Workers
Sexual Behavior
Social Support
Substance Abuse, Intravenous
Surveys and Questionnaires
Ukraine
Uncertainty
United States

Word Cloud

Created with Highcharts 10.0.0RDSsamplingtreebootstrapmethodattributesestimatesvariabilitymethodsuncertaintyhighusingevenrecruitmenttreessocialalsoinjectingdrugRespondent-drivennetwork-basedformchain-referralusedestimatepopulationsdifficultaccessstandardsurveytoolsAlthoughgrownquicklypopularitysinceintroductionstatisticalpropertiesremainelusiveparticularshownmuchhigherpreviouslyacknowledgeddesignedaccountresultmisleadinglynarrowconfidenceintervalspaperintroduceestimatingbasedresamplingusesimulationsknownnetworksshowoutperformsexistingcapturesextremecasesdesigneffectsapplydatausersUkraineUnlikedependsstructuresampledmeasuredrespondentscorrelationscanestimatedwellresultssuggestpossibleaccuratelyassesslevelinherentEstimatingrespondent-drivenHIVhard-to-reachpopulationusersnowballnetwork

Similar Articles

Cited By