MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions.

Nirmalya Thakur
Author Information
  1. Nirmalya Thakur: Department of Computer Science, Emory University, Atlanta, GA 30322, USA. ORCID

Abstract

The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the Monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Furthermore, no prior work has focused on performing a comprehensive analysis of Tweets about this ongoing outbreak. To address these challenges, this work makes three scientific contributions to this field. First, it presents an open-access dataset of 556,427 Tweets about Monkeypox that have been posted on Twitter since the first detected case of this outbreak. A comparative study is also presented that compares this dataset with 36 prior works in this field that focused on the development of Twitter datasets to further uphold the novelty, relevance, and usefulness of this dataset. Second, the paper reports the results of a comprehensive analysis of the Tweets of this dataset. This analysis presents several novel findings; for instance, out of all the 34 languages supported by Twitter, English has been the most used language to post Tweets about Monkeypox, about 40,000 Tweets related to Monkeypox were posted on the day WHO declared Monkeypox as a GPHE, a total of 5470 distinct hashtags have been used on Twitter about this outbreak out of which #Monkeypox is the most used hashtag, and Twitter for iPhone has been the leading source of Tweets about the outbreak. The sentiment analysis of the Tweets was also performed, and the results show that despite a lot of discussions, debate, opinions, information, and misinformation, on Twitter on various topics in this regard, such as Monkeypox and the LGBTQI+ community, Monkeypox and COVID-19, vaccines for Monkeypox, etc., "neutral" sentiment was present in most of the Tweets. It was followed by "negative" and "positive" sentiments, respectively. Finally, to support research and development in this field, the paper presents a list of 50 open research questions related to the outbreak in the areas of Big Data, Data Mining, Natural Language Processing, and Machine Learning that may be investigated based on this dataset.

Keywords

References

  1. JMIR Public Health Surveill. 2021 Apr 5;7(4):e26780 [PMID: 33720841]
  2. Genomics Inform. 2020 Jun;18(2):e16 [PMID: 32634870]
  3. PLoS One. 2015 Feb 25;10(2):e0118053 [PMID: 25714752]
  4. Euro Surveill. 2022 Jun;27(22): [PMID: 35656830]
  5. Data Brief. 2020 Apr 21;30:105595 [PMID: 32382607]
  6. J Infect Dis. 1987 Aug;156(2):293-8 [PMID: 3036967]
  7. Euro Surveill. 2022 Jun;27(24): [PMID: 35713026]
  8. BMC Med Inform Decis Mak. 2021 Jan 26;21(1):27 [PMID: 33499852]
  9. Clin Infect Dis. 2005 Dec 15;41(12):1742-51 [PMID: 16288398]
  10. Lancet Infect Dis. 2022 Aug;22(8):1153-1162 [PMID: 35623380]
  11. J Med Internet Res. 2013 Oct 24;15(10):e237 [PMID: 24158773]
  12. Am J Trop Med Hyg. 2005 Aug;73(2):428-34 [PMID: 16103616]
  13. Clin Infect Dis. 2014 Jan;58(2):260-7 [PMID: 24158414]
  14. Data Brief. 2020 Aug 31;32:106249 [PMID: 32944604]
  15. Postgrad Med J. 2022 Jul;98(1161):544-550 [PMID: 34373343]
  16. J Am Med Inform Assoc. 2022 Sep 12;29(10):1668-1678 [PMID: 35775946]
  17. MMWR Morb Mortal Wkly Rep. 2003 Jun 13;52(23):537-40 [PMID: 12803191]
  18. Euro Surveill. 2022 Jun;27(22): [PMID: 35656835]
  19. Int J Public Health. 2016 May;61(4):513-20 [PMID: 27193574]
  20. Data Brief. 2020 Dec;33:106401 [PMID: 33088880]
  21. Open Forum Infect Dis. 2022 Oct 25;9(11):ofac501 [PMID: 36340738]
  22. Infect Dis Rep. 2022 Nov 14;14(6):855-883 [PMID: 36412745]
  23. AERA Open. 2021 Jan-Dec;7: [PMID: 34012996]
  24. Sci Data. 2016 Mar 15;3:160018 [PMID: 26978244]
  25. J Hosp Infect. 2022 Sep;127:101-110 [PMID: 35777702]
  26. J Med Virol. 2023 Jan;95(1):e27902 [PMID: 35652133]
  27. Nature. 2022 Jul 25;: [PMID: 35879614]
  28. Data Brief. 2022 Jul 12;43:108465 [PMID: 35898858]
  29. New Sci. 2022 Jul 30;255(3397):7 [PMID: 36248680]
  30. Emerg Infect Dis. 2019 May;25(5):980-983 [PMID: 30848724]
  31. J Med Virol. 2023 Jan;95(1):e27913 [PMID: 35655436]
  32. N Engl J Med. 2016 Aug 4;375(5):401-3 [PMID: 27518656]
  33. JAMA Netw Open. 2019 Nov 1;2(11):e1914672 [PMID: 31693125]
  34. Neural Comput Appl. 2021 Oct 29;:1-9 [PMID: 34728902]
  35. J Am Med Inform Assoc. 2021 Jul 14;28(7):1564-1573 [PMID: 33690794]
  36. PLoS One. 2017 Aug 29;12(8):e0183537 [PMID: 28850620]
  37. Am J Mens Health. 2010 Mar;4(1):77-85 [PMID: 20164062]
  38. Sensors (Basel). 2021 Jan 14;21(2): [PMID: 33466895]
  39. J Biomed Inform. 2018 Nov;87:68-78 [PMID: 30292855]
  40. Rev Esp Enferm Dig. 2022 Dec;114(12):763-764 [PMID: 35704368]
  41. Lancet. 2022 Jul 2;400(10345):21-22 [PMID: 35750071]
  42. Am J Infect Control. 2015 Jun;43(6):563-71 [PMID: 26042846]
  43. Data Brief. 2016 Nov 23;10:122-131 [PMID: 27981203]
  44. Euro Surveill. 2022 Jun;27(22): [PMID: 35656836]
  45. Euro Surveill. 2022 Jun;27(26): [PMID: 35775427]
  46. J Am Med Inform Assoc. 2019 Dec 1;26(12):1618-1626 [PMID: 31562510]
  47. Bull World Health Organ. 1980;58(2):165-82 [PMID: 6249508]
  48. J Infect Dis. 2022 Apr 19;225(8):1367-1376 [PMID: 32880628]
  49. Medicina (Kaunas). 2022 Jul 11;58(7): [PMID: 35888642]
  50. Nature. 2022 Sep 30;: [PMID: 36180745]
  51. Diabetes Metab Syndr. 2022 Jan;16(1):102367 [PMID: 34933273]
  52. PLoS One. 2022 Dec 1;17(12):e0278622 [PMID: 36454991]
  53. PLoS One. 2018 Jan 5;13(1):e0190482 [PMID: 29304110]

Word Cloud

Created with Highcharts 10.0.0TweetsTwittermonkeypoxoutbreakdatasetanalysisvirusdatarelatedfieldminingdatasetsoutbreaksresearchconversationsDatapriorworkfocusedpresentsuseddeveloprecentchallengesscientificcommunitypastdifferentquestionsFurthermoreCOVID-19variousworksongoingdeclaredHealthGPHEWHOBigcomprehensivepostedalsodevelopmentpaperresultslanguagesentimentissuesglobalpandemicsemergingtechnologiestrendingmatterssignificantinterestserverichresourceinvestigationEbolaZikaflujustnameassociatedmultimodalcomponentsinfercharacteristicsrespectiveGlobalPublicEmergencyWorldOrganizationresultedsurgeresultinggenerationtremendousamountsthusfarperformingaddressmakesthreecontributionsFirstopen-access556427sincefirstdetectedcasecomparativestudypresentedcompares36upholdnoveltyrelevanceusefulnessSecondreportsseveralnovelfindingsinstance34languagessupportedEnglishpost40000daytotal5470distincthashtags#monkeypoxhashtagiPhoneleadingsourceperformedshowdespitelotdiscussionsdebateopinionsinformationmisinformationtopicsregardLGBTQI+vaccinesetc"neutral"presentfollowed"negative""positive"sentimentsrespectivelyFinallysupportlist50openareasMiningNaturalLanguageProcessingMachineLearningmayinvestigatedbasedMonkeyPox2022Tweets:Large-ScaleDataset2022MonkeypoxOutbreakFindingsAnalysisOpenResearchQuestionsbigmachinelearningnaturalprocessingsocialmediatweetstwitter

Similar Articles

Cited By (10)