Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study.

Yasmin Khan, Garvin J Leung, Paul Belanger, Effie Gournis, David L Buckeridge, Li Liu, Ye Li, Ian L Johnson
Author Information
  1. Yasmin Khan: Public Health Ontario, 480 University Avenue, Suite 300, Toronto, Ontario, M5G 1V2, Canada. yasmin.khan@oahpp.ca.
  2. Garvin J Leung: Public Health Ontario, 480 University Avenue, Suite 300, Toronto, Ontario, M5G 1V2, Canada.
  3. Paul Belanger: KFL&A Public Health, Kingston, Canada.
  4. Effie Gournis: Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.
  5. David L Buckeridge: Surveillance Lab, McGill Clinical and Health Informatics, Montreal, Canada.
  6. Li Liu: KFL&A Public Health, Kingston, Canada.
  7. Ye Li: Public Health Ontario, 480 University Avenue, Suite 300, Toronto, Ontario, M5G 1V2, Canada.
  8. Ian L Johnson: Public Health Ontario, 480 University Avenue, Suite 300, Toronto, Ontario, M5G 1V2, Canada.

Abstract

OBJECTIVES: This study examined Twitter for public health surveillance during a mass gathering in Canada with two objectives: to explore the feasibility of acquiring, categorizing and using geolocated Twitter data and to compare Twitter data against other data sources used for Pan Parapan American Games (P/PAG) surveillance.
METHODS: Syndrome definitions were created using keyword categorization to extract posts from Twitter. Categories were developed iteratively for four relevant syndromes: respiratory, gastrointestinal, heat-related illness, and influenza-like illness (ILI). All data sources corresponded to the location of Toronto, Canada. Twitter data were acquired from a publicly available stream representing a 1% random sample of tweets from June 26 to September 10, 2015. Cross-correlation analyses of time series data were conducted between Twitter and comparator surveillance data sources: emergency department visits, telephone helpline calls, laboratory testing positivity rate, reportable disease data, and temperature.
RESULTS: The frequency of daily tweets that were classified into syndromes was low, with the highest mean number of daily tweets being for ILI and respiratory syndromes (22.0 and 21.6, respectively) and the lowest, for the heat syndrome (4.1). Cross-correlation analyses of Twitter data demonstrated significant correlations for heat syndrome with two data sources: telephone helpline calls (r = 0.4) and temperature data (r = 0.5).
CONCLUSION: Using simple syndromes based on keyword classification of geolocated tweets, we found a correlation between tweets and two routine data sources for heat alerts, the only public health event detected during P/PAG. Further research is needed to understand the role for Twitter in surveillance.

Keywords

References

  1. N Engl J Med. 2009 May 21;360(21):2153-5, 2157 [PMID: 19423867]
  2. PLoS One. 2013 Dec 09;8(12):e83672 [PMID: 24349542]
  3. Afr Health Sci. 2015 Sep;15(3):797-802 [PMID: 26957967]
  4. Emerg Infect Dis. 2009 May;15(5):689-95 [PMID: 19402953]
  5. PLoS Curr. 2013 Jul 02;5: [PMID: 23852273]
  6. Methods Inf Med. 2013;52(4):326-39 [PMID: 23877537]
  7. PLoS Curr. 2013 Dec 16;5: [PMID: 24459610]
  8. J Med Internet Res. 2012 Nov 15;14(6):e156 [PMID: 23154246]
  9. Lancet. 2014 Jun 14;383(9934):2083-2089 [PMID: 24857700]
  10. J Med Internet Res. 2014 Nov 14;16(11):e250 [PMID: 25406040]
  11. J Med Internet Res. 2015 May 26;17(5):e128 [PMID: 26013683]
  12. Can Commun Dis Rep. 2017 Jul 06;43(7):156-163 [PMID: 29770080]
  13. PLoS One. 2015 Oct 05;10(10):e0139701 [PMID: 26437454]

MeSH Term

Canada
Crowding
Feasibility Studies
Humans
Public Health Surveillance
Social Media
Sports

Word Cloud

Created with Highcharts 10.0.0dataTwittersurveillancehealthsourcestweetspublictwosyndromesheatstudyCanadausinggeolocatedAmericanP/PAGkeywordrespiratoryillnessILI2015Cross-correlationanalysessources:telephonehelplinecallstemperaturedailysyndrome4r = 0routineOBJECTIVES:examinedmassgatheringobjectives:explorefeasibilityacquiringcategorizingcompareusedPanParapanGamesMETHODS:SyndromedefinitionscreatedcategorizationextractpostsCategoriesdevelopediterativelyfourrelevantsyndromes:gastrointestinalheat-relatedinfluenza-likecorrespondedlocationTorontoacquiredpubliclyavailablestreamrepresenting1%randomsampleJune26September10timeseriesconductedcomparatoremergencydepartmentvisitslaboratorytestingpositivityratereportablediseaseRESULTS:frequencyclassifiedlowhighestmeannumbertweets being220216respectivelylowest1demonstratedsignificantcorrelations5CONCLUSION:UsingsimplebasedclassificationfoundcorrelationalertseventdetectedresearchneededunderstandroleComparingPan/ParapanGames:ecologicalEmergencypreparednessMassgatheringsPublicSocialmediaSurveillance

Similar Articles

Cited By