Assessing respondent-driven sampling.

Sharad Goel, Matthew J Salganik
Author Information
  1. Sharad Goel: Microeconomics and Social Systems, Yahoo! Research, 111 West 40th Street, New York, NY 10018, USA. goel@yahoo-inc.com

Abstract

Respondent-driven sampling (RDS) is a network-based technique for estimating traits in hard-to-reach populations, for example, the prevalence of HIV among drug injectors. In recent years RDS has been used in more than 120 studies in more than 20 countries and by leading public health organizations, including the Centers for Disease Control and Prevention in the United States. Despite the widespread use and growing popularity of RDS, there has been little empirical validation of the methodology. Here we investigate the performance of RDS by simulating sampling from 85 known, network populations. Across a variety of traits we find that RDS is substantially less accurate than generally acknowledged and that reported RDS confidence intervals are misleadingly narrow. Moreover, because we model a best-case scenario in which the theoretical RDS sampling assumptions hold exactly, it is unlikely that RDS performs any better in practice than in our simulations. Notably, the poor performance of RDS is driven not by the bias but by the high variance of estimates, a possibility that had been largely overlooked in the RDS literature. Given the consistency of our results across networks and our generous sampling conditions, we conclude that RDS as currently practiced may not be suitable for key aspects of public health surveillance where it is now extensively applied.

References

  1. J Urban Health. 2006 Nov;83(6 Suppl):i29-38 [PMID: 16933101]
  2. Int J Drug Policy. 2008 Feb;19(1):42-51 [PMID: 18226516]
  3. Sociol Methodol. 2009 Aug 1;39(1):73-116 [PMID: 20161130]
  4. Soc Sci Med. 1994 Jan;38(1):79-88 [PMID: 8146718]
  5. AIDS Behav. 2008 Jul;12(4 Suppl):S131-41 [PMID: 18535901]
  6. AIDS Behav. 2008 Jul;12(4 Suppl):S105-30 [PMID: 18561018]
  7. Drug Alcohol Depend. 2002 Nov;68 Suppl 1:S57-67 [PMID: 12324175]
  8. J Am Stat Assoc. 2010 Mar 1;105(489):59-70 [PMID: 23729943]
  9. AIDS Behav. 2008 Mar;12(2):294-304 [PMID: 17712620]
  10. Public Health Rep. 2007;122 Suppl 1:48-55 [PMID: 17354527]
  11. Demography. 2009 Feb;46(1):103-25 [PMID: 19348111]
  12. AIDS. 2005 May;19 Suppl 2:S67-72 [PMID: 15930843]
  13. AIDS Care. 2009 Sep;21(9):1195-202 [PMID: 20024780]
  14. Am J Epidemiol. 2000 Nov 15;152(10):913-22 [PMID: 11092433]
  15. PLoS One. 2009 Sep 07;4(9):e6777 [PMID: 19738904]
  16. AIDS. 1994 Sep;8(9):1331-6 [PMID: 7802989]
  17. J Urban Health. 2006 Nov;83(6 Suppl):i83-97 [PMID: 17072761]
  18. Sociol Methodol. 2010 Aug;40(1):285-327 [PMID: 22969167]
  19. Public Health Rep. 2007;122 Suppl 1:32-8 [PMID: 17354525]
  20. J Urban Health. 2006 Nov;83(6 Suppl):i39-53 [PMID: 17096189]
  21. Sex Transm Dis. 2009 Dec;36(12):750-6 [PMID: 19704394]
  22. J Urban Health. 2006 Nov;83(6 Suppl):i98-112 [PMID: 16937083]
  23. J Urban Health. 2006 May;83(3):459-76 [PMID: 16739048]
  24. AIDS Behav. 2008 Jul;12(4 Suppl):S97-104 [PMID: 18389357]
  25. AIDS Behav. 2005 Dec;9(4):387-402 [PMID: 16235135]
  26. Stat Med. 2009 Jul 30;28(17):2202-29 [PMID: 19572381]
  27. MMWR Morb Mortal Wkly Rep. 2009 Apr 10;58(13):329-32 [PMID: 19357632]
  28. Ann Epidemiol. 2010 Feb;20(2):159-67 [PMID: 20123167]
  29. J Acquir Immune Defic Syndr. 2007 Aug 15;45(5):581-7 [PMID: 17577125]

Grants

  1. R01 HD062366/NICHD NIH HHS
  2. P01 HD031921/NICHD NIH HHS
  3. R24 HD047879/NICHD NIH HHS
  4. P01-HD31921/NICHD NIH HHS
  5. R01HD062366/NICHD NIH HHS

MeSH Term

Algorithms
Communicable Disease Control
Data Collection
Data Interpretation, Statistical
HIV Infections
Humans
Models, Statistical
Population Surveillance
Public Health
Reproducibility of Results
Research Design
Sample Size
Substance Abuse, Intravenous

Word Cloud

Created with Highcharts 10.0.0RDSsamplingtraitspopulationspublichealthperformanceRespondent-drivennetwork-basedtechniqueestimatinghard-to-reachexampleprevalenceHIVamongdruginjectorsrecentyearsused120studies20countriesleadingorganizationsincludingCentersDiseaseControlPreventionUnitedStatesDespitewidespreadusegrowingpopularitylittleempiricalvalidationmethodologyinvestigatesimulating85knownnetworkAcrossvarietyfindsubstantiallylessaccurategenerallyacknowledgedreportedconfidenceintervalsmisleadinglynarrowMoreovermodelbest-casescenariotheoreticalassumptionsholdexactlyunlikelyperformsbetterpracticesimulationsNotablypoordrivenbiashighvarianceestimatespossibilitylargelyoverlookedliteratureGivenconsistencyresultsacrossnetworksgenerousconditionsconcludecurrentlypracticedmaysuitablekeyaspectssurveillancenowextensivelyappliedAssessingrespondent-driven

Similar Articles

Cited By