A systematic evaluation of text mining methods for short texts: Mapping individuals' internal states from online posts.

Ana Macanovic, Wojtek Przepiorka
Author Information
  1. Ana Macanovic: Department of Sociology/ICS, Utrecht University, Utrecht, The Netherlands. a.macanovic@uu.nl. ORCID
  2. Wojtek Przepiorka: Department of Sociology/ICS, Utrecht University, Utrecht, The Netherlands.

Abstract

Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals' internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders' evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets, while providing best-practice recommendations.

Keywords

References

  1. Abdurahman, S., Atari, M., Karimi-Malekabadi, F., Xue, M. J., Trager, J., Park, P. S., Golazizian, P., Omrani, A., & Dehghani, M. (2023). Perils and Opportunities in Using Large Language Models in Psychological Research. OSF preprint. https://doi.org/10.31219/osf.io/tg79n
  2. Aggarwal, C. C. (2018). Machine learning for text. Springer International Publishing. https://doi.org/10.1007/978-3-319-73531-3 [DOI: 10.1007/978-3-319-73531-3]
  3. Aggarwal, C. C., & Zhai, C. (2012). A Survey of Text Classification Algorithms. In C. C. Aggarwal & C. Zhai (Eds.), Mining Text Data (pp. 163–222). Springer US. https://doi.org/10.1007/978-1-4614-3223-4 [DOI: 10.1007/978-1-4614-3223-4]
  4. Amador Diaz Lopez, J. C., Collignon-Delmar, S., Benoit, K., & Matsuo, A. (2017). Predicting the brexit vote by tracking and classifying public opinion using twitter data. Statistics, Politics and Policy, 8(1). https://doi.org/10.1515/spp-2017-0006
  5. Barberá, P., Boydstun, A. E., Linn, S., McMahon, R., & Nagler, J. (2021). Automated Text Classification of News Articles: A Practical Guide. Political Analysis, 29(1), 19–42. https://doi.org/10.1017/pan.2020.8 [DOI: 10.1017/pan.2020.8]
  6. Bassignana, E., Basile, V., & Patti, V. (2018). Hurtlex: A Multilingual Lexicon of Words to Hurt. In: Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-It 2018) (pp. 51-56). https://doi.org/10.4000/books.aaccademia.3085
  7. Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. American Political Science Review, 110(2), 278–295. https://doi.org/10.1017/S0003055416000058 [DOI: 10.1017/S0003055416000058]
  8. Bleidorn, W., & Hopwood, C. J. (2019). Using Machine Learning to Advance Personality Assessment and Theory. Personality and Social Psychology Review, 23(2), 190–203. https://doi.org/10.1177/1088868318772990 [DOI: 10.1177/1088868318772990]
  9. Bonikowski, B., & Gidron, N. (2016). The Populist Style in American Politics: Presidential Campaign Discourse, 1952–1996. Social Forces, 94(4), 1593–1621. https://doi.org/10.1093/sf/sov120 [DOI: 10.1093/sf/sov120]
  10. Bonikowski, B., & Nelson, L. K. (2022). From Ends to Means: The Promise of Computational Text Analysis for Theoretically Driven Sociological Research. Sociological Methods & Research, 51(4), 1469–1483. https://doi.org/10.1177/00491241221123088 [DOI: 10.1177/00491241221123088]
  11. Bonikowski, B., Luo, Y., & Stuhler, O. (2022). Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in U.S. Presidential Campaigns (1952–2020) with Neural Language Models. Sociological Methods & Research, 51(4), 1721–1787. https://doi.org/10.1177/00491241221122317 [DOI: 10.1177/00491241221122317]
  12. Boyd, R. L., Wilson, S. R., Pennebaker, J. W., Kosinski, M., Stillwell, D. J., & Mihalcea, R. (2015). Values in Words: Using Language to Evaluate and Understand Personal Values. In: Proceedings of the Ninth International AAAI Conference on Web and Social Media (pp. 31–40). https://doi.org/10.1609/icwsm.v9i1.14589 [DOI: 10.1609/icwsm.v9i1.14589]
  13. Boyd, R. L., & Pennebaker, J. W. (2017). Language-based personality: A new approach to personality in a digital world. Current Opinion in Behavioral Sciences, 18, 63–68. https://doi.org/10.1016/j.cobeha.2017.07.017 [DOI: 10.1016/j.cobeha.2017.07.017]
  14. Boyd, R. L., & Schwartz, H. A. (2021). Natural Language Analysis and the Psychology of Verbal Behavior: The Past, Present, and Future States of the Field. Journal of Language and Social Psychology, 40(1), 21–41. https://doi.org/10.1177/0261927X20967028 [DOI: 10.1177/0261927X20967028]
  15. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
  16. Chae, Y., & Davidson, T. (2023). Large Language Models for Text Classification: From Zero-Shot Learning to Fine-Tuning. SocArXiv preprint. https://doi.org/10.31235/osf.io/sthwk
  17. Craig, S., Gammerman, A., & Vovk, V. (1998). Ridge Regression Learning Algorithm in Dual Variables. In: Proceedings of the 15th International Conference on Machine Learning, ICML ’98 (pp. 515–521) https://doi.org/10.5555/645527.657464
  18. Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language. Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 512–515. [DOI: 10.1609/icwsm.v11i1.14955]
  19. Dehghani, M., Sagae, K., Sachdeva, S., & Gratch, J. (2014). Analyzing Political Rhetoric in Conservative and Liberal Weblogs Related to the Construction of the ‘Ground Zero Mosque.’ Journal of Information Technology and Politics, 11(1), 1–14. https://doi.org/10.1080/19331681.2013.826613 [DOI: 10.1080/19331681.2013.826613]
  20. Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., & Ravi, S. (2020). GoEmotions: A Dataset of Fine-Grained Emotions. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, (pp. 4040–4054). https://doi.org/10.18653/v1/2020.acl-main.372
  21. Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168–189. https://doi.org/10.1017/pan.2017.44 [DOI: 10.1017/pan.2017.44]
  22. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North, (pp. 4171–4186). https://doi.org/10.18653/v1/N19-1423
  23. Di Natale, A., & Garcia, D. (2023). LEXpander: Applying colexification networks to automated lexicon expansion. Behavior Research Methods. https://doi.org/10.3758/s13428-023-02063-y
  24. Do, S., Ollion, E., & Shen, R. (2022). The augmented social scientist: using sequential transfer learning to annotate millions of texts with human-level accuracy. Sociological Methods & Research, 00491241221134526. https://doi.org/10.1177/00491241221134526
  25. Eads, A., Schofield, A., Mahootian, F., Mimno, D., & Wilderom, R. (2021). Separating the wheat from the chaff: A topic and keyword-based procedure for identifying research-relevant text. Poetics, 86, 101527. https://doi.org/10.1016/j.poetic.2020.101527 [DOI: 10.1016/j.poetic.2020.101527]
  26. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3–4), 169–200. https://doi.org/10.1080/02699939208411068 [DOI: 10.1080/02699939208411068]
  27. Erlich, A., Dantas, S. G., Bagozzi, B. E., Berliner, D., & Palmer-Rubin, B. (2022). Multi-Label Prediction for Political Text-as-Data. Political Analysis, 30(4), 463–480. https://doi.org/10.1017/pan.2021.15 [DOI: 10.1017/pan.2021.15]
  28. Evans, J. A., & Aceves, P. (2016). Machine Translation: Mining Text for Social Theory. Annual Review of Sociology, 42(1), 21–50. https://doi.org/10.1146/annurev-soc-081715-074206 [DOI: 10.1146/annurev-soc-081715-074206]
  29. Farnadi, G., Zoghbi, S., Moens, M.-F., & De Cock, M. (2021). Recognising personality traits using facebook status updates. Proceedings of the International AAAI Conference on Web and Social Media, 7(2), 14–18. https://doi.org/10.1609/icwsm.v7i2.14470 [DOI: 10.1609/icwsm.v7i2.14470]
  30. Feurer, M., & Hutter, F. (2019). Hyperparameter Optimization. In F. Hutter, L. Kotthoff, & J. Vanschoren (Eds.), Automated Machine Learning (pp. 3–33). Springer International Publishing. https://doi.org/10.1007/978-3-030-05318-5_1 [DOI: 10.1007/978-3-030-05318-5_1]
  31. Figueroa, R. L., Zeng-Treitler, Q., Kandula, S., & Ngo, L. H. (2012). Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making, 12(1), 8. https://doi.org/10.1186/1472-6947-12-8 [DOI: 10.1186/1472-6947-12-8]
  32. Frimer, J., Haidt, J., Graham, J., Dehgani, M., & Boghrati, R. (2017). Moral Foundations Dictionaries for Linguistic Analyses, 2.0 (MFD 2.0). OSF preprint. https://doi.org/10.17605/OSF.IO/EZN37
  33. Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor & A. Marchi (Eds.), Corpus Approaches To Discourse: A critical review (pp. 225–258). Routledge. [DOI: 10.4324/9781315179346-11]
  34. Gao, K., He, S., He, Z., Lin, J., Pei, Q., Shao, J., & Zhang, W. (2023). Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models. arXiv preprint. https://doi.org/10.48550/arXiv.2308.14149
  35. Garten, J., Hoover, J., Johnson, K. M., Boghrati, R., Iskiwitch, C., & Dehghani, M. (2018). Dictionaries and distributions: Combining expert knowledge and large scale textual data content analysis. Behavior Research Methods, 50(1), 344–361. https://doi.org/10.3758/s13428-017-0875-9 [DOI: 10.3758/s13428-017-0875-9]
  36. Garten, J., Boghrati, R., Hoover, J., Johnson, K. M., & Dehghani, M. (2016). Morality between the lines: Detecting moral sentiment in text. In: Proceedings of IJCAI 2016 Workshop on Computational Modeling of Attitudes.
  37. Gilardi, F., Alizadeh, M., & Kubli, M. (2023). ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30), e2305016120. https://doi.org/10.1073/pnas.2305016120 [DOI: 10.1073/pnas.2305016120]
  38. Goldberg, A. (2015). In defense of forensic social science. Big Data & Society, 2(2), 1–3. https://doi.org/10.1177/2053951715601145 [DOI: 10.1177/2053951715601145]
  39. Golder, S. A., & Macy, M. W. (2014). Digital Footprints: Opportunities and Challenges for Online Social Research. Annual Review of Sociology, 40(1), 129–152. https://doi.org/10.1146/annurev-soc-071913-043145 [DOI: 10.1146/annurev-soc-071913-043145]
  40. Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral Foundations Theory: The Pragmatic Validity of Moral Pluralism. Advances in Experimental Social Psychology, 47, 55–130. https://doi.org/10.1016/B978-0-12-407236-7.00002-4 [DOI: 10.1016/B978-0-12-407236-7.00002-4]
  41. Grimmer, J., & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267–297. https://doi.org/10.1093/pan/mps028 [DOI: 10.1093/pan/mps028]
  42. Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine Learning for Social Science: An Agnostic Approach. Annual Review of Political Science, 24(1), 395–419. https://doi.org/10.1146/annurev-polisci-053119-015921 [DOI: 10.1146/annurev-polisci-053119-015921]
  43. Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press.
  44. Hasan, K. S., & Ng, V. (2014). Why are You Taking this Stance? Identifying and Classifying Reasons in Ideological Debates. In:Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (pp. 751–762). https://doi.org/10.3115/v1/D14-1083
  45. Hoover, J., Portillo-Wightman, G., Yeh, L., Havaldar, S., Davani, A. M., Lin, Y., Kennedy, B., Atari, M., Kamel, Z., Mendlen, M., Moreno, G., Park, C., Chang, T. E., Chin, J., Leong, C., Leung, J. Y., Mirinjian, A., & Dehghani, M. (2020). Moral Foundations Twitter Corpus: A Collection of 35k Tweets Annotated for Moral Sentiment. Social Psychological and Personality Science, 11(8), 1057–1071. https://doi.org/10.1177/1948550619876629 [DOI: 10.1177/1948550619876629]
  46. Hotho, A., Nürnberger, A., & Paaß, G. (2005). A brief survey of text mining. Ldv Forum, 20(1), 19–62.
  47. Hurtado Bodell, M., Magnusson, M., & Mützel, S. (2022). From documents to data: A framework for total corpus quality. Socius: Sociological Research for a Dynamic World, 8, 237802312211355. https://doi.org/10.1177/23780231221135523 [DOI: 10.1177/23780231221135523]
  48. Ignatow, G. (2016). Theoretical Foundations for Digital Text Analysis. Journal for the Theory of Social Behaviour, 46(1), 104–120. https://doi.org/10.1111/jtsb.12086 [DOI: 10.1111/jtsb.12086]
  49. Iliev, R., Dehghani, M., & Sagi, E. (2015). Automated text analysis in psychology: Methods, applications, and future developments. Language and Cognition, 7(2), 265–290. https://doi.org/10.1017/langcog.2014.30 [DOI: 10.1017/langcog.2014.30]
  50. Jaidka, K., Giorgi, S., Schwartz, H. A., Kern, M. L., Ungar, L. H., & Eichstaedt, J. C. (2020). Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods. Proceedings of the National Academy of Sciences, 117(19), 10165–10171. https://doi.org/10.1073/pnas.1906364117 [DOI: 10.1073/pnas.1906364117]
  51. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer New York. https://doi.org/10.1007/978-1-4614-7138-7 [DOI: 10.1007/978-1-4614-7138-7]
  52. Jurafsky, D., & Martin, J. H. (2021). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Unpublished manuscript. https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf . Accessed: 10.02.2024.
  53. Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall.
  54. Kennedy, B., Atari, M., Mostafazadeh Davani, A., Hoover, J., Omrani, A., Graham, J., & Dehghani, M. (2021). Moral concerns are differentially observable in language. Cognition, 212, 104696. https://doi.org/10.1016/j.cognition.2021.104696 [DOI: 10.1016/j.cognition.2021.104696]
  55. Kennedy, B., Ashokkumar, A., Boyd, R. L., & Dehgani, M. (2022). Text analysis for psychology: Methods, principles, and practices. In M. Dehghani & R. L. Boyd (Eds.), Handbook of language analysis in psychology (pp. 3–62). The Guilford Press.
  56. Kern, M. L., Park, G., Eichstaedt, J. C., Schwartz, H. A., Sap, M., Smith, L. K., & Ungar, L. H. (2016). Gaining insights from social media language: Methodologies and challenges. Psychological Methods, 21(4), 507–525. https://doi.org/10.1037/met0000091 [DOI: 10.1037/met0000091]
  57. King, G., Lam, P., & Roberts, M. E. (2017). Computer-Assisted Keyword and Document Set Discovery from Unstructured Text. American Journal of Political Science, 61(4), 971–988. https://doi.org/10.1111/ajps.12291 [DOI: 10.1111/ajps.12291]
  58. Kocoń, J., Cichecki, I., Kaszyca, O., Kochanek, M., Szydło, D., Baran, J., Bielaniewicz, J., Gruza, M., Janz, A., Kanclerz, K., Kocoń, A., Koptyra, B., Mieleszczenko-Kowszewicz, W., Miłkowski, P., Oleksy, M., Piasecki, M., Radliński, Ł, Wojtasik, K., Woźniak, S., & Kazienko, P. (2023). ChatGPT: Jack of all trades, master of none. Information Fusion, 99, 101861. https://doi.org/10.1016/j.inffus.2023.101861 [DOI: 10.1016/j.inffus.2023.101861]
  59. Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsallakh, B., Reynolds, J., Melnikov, A., Kliushkina, N., Araya, C., Yan, S., & Reblitz-Richardson, O. (2020). Captum: A unified and generic model interpretability library for PyTorch. arXiv preprint. https://doi.org/10.48550/arXiv.2009.07896
  60. Koutsoumpis, A., Oostrom, J. K., Holtrop, D., Van Breda, W., Ghassemi, S., & De Vries, R. E. (2022). The kernel of truth in text-based personality assessment: A meta-analysis of the relations between the Big Five and the Linguistic Inquiry and Word Count (LIWC). Psychological Bulletin, 148(11–12), 843–868. https://doi.org/10.1037/bul0000381 [DOI: 10.1037/bul0000381]
  61. Krippendorff, K. (2004a). Content Analysis: An Introduction to Its Methodology (2nd ed.). SAGE Publications.
  62. Krippendorff, K. (2004b). Measuring the Reliability of Qualitative Text Analysis Data. Quality & Quantity, 38(6), 787–800. https://doi.org/10.1007/s11135-004-8107-7 [DOI: 10.1007/s11135-004-8107-7]
  63. Kristensen-McLachlan, R. D., Canavan, M., Kardos, M., Jacobsen, M., & Aarøe, L. (2023). Chatbots Are Not Reliable Text Annotators. arXiv preprint. https://doi.org/10.48550/arXiv.2311.05769
  64. Kröll, M., & Strohmaier, M. (2009). Analyzing human intentions in natural language text. In: Proceedings of the Fifth International Conference on Knowledge Capture - K-CAP ’09 (pp. 197–198). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/1597735.1597780
  65. Kusen, E., Cascavilla, G., Figl, K., Conti, M., & Strembeck, M. (2017). Identifying Emotions in Social Media: Comparison of Word-Emotion Lexicons. In: 2017 5th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW) (pp. 132–137). Danvers, MA: Conference Publishing Services. https://doi.org/10.1109/FiCloudW.2017.75
  66. Lazer, D., Hargittai, E., Freelon, D., Gonzalez-Bailon, S., Munger, K., Ognyanova, K., & Radford, J. (2021). Meaningful measures of human society in the twenty-first century. Nature, 595(7866), 189–196. https://doi.org/10.1038/s41586-021-03660-7 [DOI: 10.1038/s41586-021-03660-7]
  67. Lee, J., Tang, R., & Lin, J. (2019). What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning. arXiv preprint. https://doi.org/10.48550/ARXIV.1911.03090
  68. Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators. In: Proceedings of the 5th International Conference on Conversational User Interfaces (pp. 1–6). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/3571884.3604316
  69. Liu, Y., Chen, R., Chen, Y., Mei, Q., & Salib, S. (2012). ‘I loan because...’: Understanding motivations for pro-social lending. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (pp. 503–512). https://doi.org/10.1145/2124295.2124356
  70. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized BERT pretraining approach. arXiv preprint. https://doi.org/10.48550/arXiv.1907.11692
  71. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv preprint. https://doi.org/10.48550/arXiv.2107.13586
  72. Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (30th ed., pp. 4768–4777)
  73. Lupyan, G., & Goldstone, R. L. (2019). Introduction to special issue. Beyond the lab: Using big data to discover principles of cognition. Behavior Research Methods, 51(4), 1473–1476. https://doi.org/10.3758/s13428-019-01278-2 [DOI: 10.3758/s13428-019-01278-2]
  74. Lykousas, N., Patsakis, C., Kaltenbrunner, A., & Gómez, V. (2019). Sharing emotions at scale: The Vent dataset. arXiv preprint. https://doi.org/10.48550/arXiv.1901.04856
  75. Macanovic, A. (2022). Text mining for social science – The state and the future of computational text analysis in sociology. Social Science Research, 108, 102784. https://doi.org/10.1016/j.ssresearch.2022.102784 [DOI: 10.1016/j.ssresearch.2022.102784]
  76. Macanovic, A., & Przepiorka, W. (2023). The Moral Embeddedness of Cryptomarkets: Text Mining Feedback on Economic Exchanges on the Dark Web. Socio-Economic Review, mwad069. https://doi.org/10.1093/ser/mwad069
  77. Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures, 12(2–3), 93–118. https://doi.org/10.1080/19312458.2018.1430754 [DOI: 10.1080/19312458.2018.1430754]
  78. Malko, A., Paris, C., Duenser, A., Kangas, M., Molla, D., Sparks, R., & Wan, S. (2021). Demonstrating the reliability of self-annotated emotion data. In: Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access, (pp. 45–54). https://doi.org/10.18653/v1/2021.clpsych-1.5
  79. Marquardt, K. L., Pemstein, D., Sanhueza, C., Petrarca, B. S., Wilson, S. L., Bernhard, M., Coppedge, M., & Lindberg, S. I. (2017). Experts, Coders, and Crowds: An analysis of substitutability. V-Dem Working Paper, 53. https://doi.org/10.2139/ssrn.3046462
  80. Matsuo, A., Sasahara, K., Taguchi, Y., & Karasawa, M. (2019). Development and validation of the Japanese Moral Foundations Dictionary. PLOS ONE, 14(3), e0213343. https://doi.org/10.1371/journal.pone.0213343 [DOI: 10.1371/journal.pone.0213343]
  81. Merchant, A., Rahimtoroghi, E., Pavlick, E., & Tenney, I. (2020). What Happens To BERT Embeddings During Fine-tuning? arXiv preprint. https://doi.org/10.48550/arXiv.2004.14448
  82. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient Estimation of Word Representations in Vector Space. arXiv preprint. https://doi.org/10.48550/arXiv.1301.3781
  83. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems, 2, 3111–3119.
  84. Miller, G. (1998). WordNet (1.6). MIT Press.
  85. Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep Learning-based Text Classification: A Comprehensive Review. ACM Computing Surveys, 54(3), 1–40. https://doi.org/10.1145/3439726 [DOI: 10.1145/3439726]
  86. Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a Word-Emotion Association Lexicon. Computational Intelligence, 29(3), 436–465. https://doi.org/10.1111/j.1467-8640.2012.00460.x [DOI: 10.1111/j.1467-8640.2012.00460.x]
  87. Mosbach, M., Andriushchenko, M., & Klakow, D. (2021, March 25). On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines. arXiv preprint. https://doi.org/10.48550/arXiv.2006.04884
  88. Mozes, M., van der Vegt, I., & Kleinberg, B. (2021). A repeated-measures study on emotional responses after a year in the pandemic. Scientific Reports, 11(1), 23114. https://doi.org/10.1038/s41598-021-02414-9 [DOI: 10.1038/s41598-021-02414-9]
  89. Mpouli, S., Beigbeder, M., & Largeron, C. (2020). Lexifield: A system for the automatic building of lexicons by semantic expansion of short word lists. Knowledge and Information Systems, 62(8), 3181–3201. https://doi.org/10.1007/s10115-020-01451-6 [DOI: 10.1007/s10115-020-01451-6]
  90. Nam, J., Loza Mencía, E., Kim, H. J., & Fürnkranz, J. (2017). Maximizing subset accuracy with recurrent neural networks in multi-label classification. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). San Diego, CA: Neural Information Processing Systems Foundation, Inc. (NeurIPS)
  91. Nelson, L. K. (2017). Computational Grounded Theory: A Methodological Framework. Sociological Methods & Research, 49(1), 3–42. https://doi.org/10.1177/0049124117729703 [DOI: 10.1177/0049124117729703]
  92. Nelson, L. K. (2019). To Measure Meaning in Big Data, Don’t Give Me a Map, Give Me Transparency and Reproducibility. Sociological Methodology, 49(1), 139–143. https://doi.org/10.1177/0081175019863783 [DOI: 10.1177/0081175019863783]
  93. Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2021). The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods. Sociological Methods & Research, 50(1), 202–237. https://doi.org/10.1177/0049124118769114 [DOI: 10.1177/0049124118769114]
  94. Nguyen, D. Q., Vu, T., & Tuan Nguyen, A. (2020). BERTweet: A pre-trained language model for English Tweets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, (pp. 9–14). https://doi.org/10.18653/v1/2020.emnlp-demos.2
  95. Norbutas, L., Ruiter, S., & Corten, R. (2020). Reputation transferability across contexts: Maintaining cooperation among anonymous cryptomarket actors when moving between markets. International Journal of Drug Policy, 76, 102635. https://doi.org/10.1016/j.drugpo.2019.102635 [DOI: 10.1016/j.drugpo.2019.102635]
  96. Ollion, É., Shen, R., Macanovic, A., & Chatelain, A. (2024). The dangers of using proprietary LLMs for research. Nature Machine Intelligence, 6(1), 4–5. https://doi.org/10.1038/s42256-023-00783-6 [DOI: 10.1038/s42256-023-00783-6]
  97. Ollion, É., Shen, R., Macanovic, A., & Chatelain, A. (2023). ChatGPT for Text Annotation? Mind the Hype! SocArXiv preprint. https://doi.org/10.31235/osf.io/x58kn
  98. Palmer, A., Smith, N. A., & Spirling, A. (2023). Using proprietary language models in academic research requires explicit justification. Nature Computational Science, 4(1), 2–3. https://doi.org/10.1038/s43588-023-00585-1 [DOI: 10.1038/s43588-023-00585-1]
  99. Pangakis, N., Wolken, S., & Fasching, N. (2023). Automated Annotation with Generative AI Requires Validation. arXiv preprint. https://doi.org/10.48550/arXiv.2306.00176
  100. Pellert, M., Metzler, H., Matzenberger, M., & Garcia, D. (2022). Validating daily social media macroscopes of emotions. Scientific Reports, 12(1), 11236. https://doi.org/10.1038/s41598-022-14579-y [DOI: 10.1038/s41598-022-14579-y]
  101. Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Austin, TX: University of Texas at Austin.
  102. Pojanapunya, P., & Watson Todd, R. (2018). Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 14(1), 133–167. https://doi.org/10.1515/cllt-2015-0030 [DOI: 10.1515/cllt-2015-0030]
  103. Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., & Bandyopadhyay, S. (2013). Enhanced SenticNet with Affective Labels for Concept-Based Opinion Mining. IEEE Intelligent Systems, 28(2), 31–38. https://doi.org/10.1109/MIS.2013.4 [DOI: 10.1109/MIS.2013.4]
  104. Prabhakaran, V., Rambow, O., & Diab, M. T. (2012). Predicting overt display of power in written dialogs. In: NAACL HLT ’12: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 518–522). Stroudsburg, PA: Association for Computational Linguistics.
  105. Radford, J., & Lazer, D. (2019). Big Data for Sociological Research. In G. Ritzer & W. W. Murphy (Eds.), The Wiley Blackwell Companion to Sociology (2nd ed., pp. 417–443). John Wiley & Sons. https://doi.org/10.1002/9781119429333.ch24 [DOI: 10.1002/9781119429333.ch24]
  106. Raschka, S. (2020). Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint. https://doi.org/10.48550/arXiv.1811.12808
  107. Rathje, S., Mirea, D.-M., Sucholutsky, I., Marjieh, R., Robertson, C., & Van Bavel, J. J. (2023). GPT is an effective tool for multilingual psychological text analysis. PsyArXiv preprint. https://doi.org/10.31234/osf.io/sekf5
  108. Rauthmann, J. F. (2020). A (More) Behavioural Science of Personality in the Age of Multi-Modal Sensing, Big Data, Machine Learning, and Artificial Intelligence. European Journal of Personality, 34(5), 593–598. https://doi.org/10.1002/per.2310 [DOI: 10.1002/per.2310]
  109. Reiss, M. V. (2023). Testing the Reliability of ChatGPT for Text Annotation and Classification: A Cautionary Remark. arXiv preprint. https://doi.org/10.48550/arXiv.2304.11085
  110. Rezapour, R., Shah, S., & Diesner, J. (2019). Enhancing the measurement of social effects by capturing morality. In: Proceedings of the 10th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), (pp. 35–45). Kerrville, TX: Association for Computational Linguistics.
  111. Rhys, H. I. (2020). Machine Learning with R, the tidyverse, and mlr. Manning Publications Co.
  112. Salganik, M. J. (2017). Bit by Bit: Social Research in the Digital Age. Princeton University Press.
  113. Schultheiss, O. C. (2013). Are implicit motives revealed in mere words? Testing the marker-word hypothesis with computer-based text analysis. Frontiers in Psychology, 4, 748. https://doi.org/10.3389/fpsyg.2013.00748 [DOI: 10.3389/fpsyg.2013.00748]
  114. Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E. P., & Ungar, L. H. (2013). Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE, 8(9), e73791. https://doi.org/10.1371/journal.pone.0073791 [DOI: 10.1371/journal.pone.0073791]
  115. Song, H., Tolochko, P., Eberl, J.-M., Eisele, O., Greussing, E., Heidenreich, T., Lind, F., Galyga, S., & Boomgaarden, H. G. (2020). In Validations We Trust? The Impact of Imperfect Human Annotations as a Gold Standard on the Quality of Validation of Automated Content Analysis. Political Communication, 37(4), 550–572. https://doi.org/10.1080/10584609.2020.1723752 [DOI: 10.1080/10584609.2020.1723752]
  116. Spirling, A. (2023). Why open-source generative AI models are an ethical way forward for science. Nature, 616(7957), 413–413. https://doi.org/10.1038/d41586-023-01295-4 [DOI: 10.1038/d41586-023-01295-4]
  117. Spörlein, C., & Schlueter, E. (2021). Ethnic Insults in YouTube Comments: Social Contagion and Selection Effects During the German “Refugee Crisis.” European Sociological Review, 37(3), 411–428. https://doi.org/10.1093/esr/jcaa053 [DOI: 10.1093/esr/jcaa053]
  118. Strapparava, C., & Valitutti, A. (2004). WordNet Affect: An Affective Extension of WordNet. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). Lisbon, Portugal: European Language Resources Association (ELRA).
  119. Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to Fine-Tune BERT for Text Classification? arXiv preprint. https://doi.org/10.48550/arXiv.1905.05583
  120. Sun, A. (2012). Short text classification using very few words. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR ’12 (pp. 1145–1146). https://doi.org/10.1145/2348283.2348511
  121. Tausczik, Y. R., & Pennebaker, J. W. (2010). The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology, 29(1), 24–54. https://doi.org/10.1177/0261927X09351676 [DOI: 10.1177/0261927X09351676]
  122. Tay, L., Woo, S. E., Hickman, L., & Saef, R. M. (2020). Psychometric and Validity Issues in Machine Learning Approaches to Personality Assessment: A Focus on Social Media Text Mining. European Journal of Personality, 34(5), 826–844. https://doi.org/10.1002/per.2290 [DOI: 10.1002/per.2290]
  123. Törnberg, P. (2023). ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning. arXiv preprint. https://doi.org/10.48550/arXiv.2304.06588
  124. Torres, M., & Cantú, F. (2022). Learning to See: Convolutional Neural Networks for the Analysis of Social Science Data. Political Analysis, 30(1), 113–131. https://doi.org/10.1017/pan.2021.9 [DOI: 10.1017/pan.2021.9]
  125. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv preprint. https://doi.org/10.48550/arXiv.2302.13971
  126. Troiano, E., Padó, S., & Klinger, R. (2019). Crowdsourcing and Validating Event-focused Emotion Corpora for German and English. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics., (pp. 4005–4011). Florence, Italy: Association for Computational Linguistics
  127. Tsoumakas, G., & Katakis, I. (2007). Multi-Label Classification. International Journal of Data Warehousing and Mining, 3(3), 1–13. https://doi.org/10.4018/jdwm.2007070101 [DOI: 10.4018/jdwm.2007070101]
  128. Uysal, A. K., & Gunal, S. (2014). The impact of preprocessing on text classification. Information Processing & Management, 50(1), 104–112. https://doi.org/10.1016/j.ipm.2013.08.006 [DOI: 10.1016/j.ipm.2013.08.006]
  129. van Atteveldt, W., van der Velden, M. A. C. G., & Boukes, M. (2021). The Validity of Sentiment Analysis: Comparing Manual Annotation, Crowd-Coding, Dictionary Approaches, and Machine Learning Algorithms. Communication Methods and Measures, 15(2), 121–140. https://doi.org/10.1080/19312458.2020.1869198 [DOI: 10.1080/19312458.2020.1869198]
  130. van Loon, A., Stewart, S., Waldon, B., Lakshmikanth, S. K., Shah, I., Guntuku, S. C., Sherman, G., Zou, J., & Eichstaedt, J. (2020). Explaining the Trump Gap in Social Distancing Using COVID Discourse. In: Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.nlpcovid19-2.10
  131. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł, & Polosukhin, I. (2018). Attention is All you Need. Advances in Neural Information Processing Systems, 30, 5999–6010.
  132. Vazire, S. (2010). Who knows what about a person? The self–other knowledge asymmetry (SOKA) model. Journal of Personality and Social Psychology, 98(2), 281–300. https://doi.org/10.1037/a0017908 [DOI: 10.1037/a0017908]
  133. Vine, V., Boyd, R. L., & Pennebaker, J. W. (2020). Natural emotion vocabularies as windows on distress and well-being. Nature Communications, 11(1), 4525. https://doi.org/10.1038/s41467-020-18349-0 [DOI: 10.1038/s41467-020-18349-0]
  134. Wang, Y., Tian, J., Yazar, Y., Ones, D. S., & Landers, R. N. (2022). Using natural language processing and machine learning to replace human content coders. Psychological Methods.  https://doi.org/10.1037/met0000518
  135. Wang, Z., Zhang, G., Yang, K., Shi, N., Zhou, W., Hao, S., Xiong, G., Li, Y., Sim, M. Y., Chen, X., Zhu, Q., Yang, Z., Nik, A., Liu, Q., Lin, C., Wang, S., Liu, R., Chen, W., Xu, K., …, Fu, J. (2023). Interactive Natural Language Processing. arXiv preprint. https://doi.org/10.48550/arXiv.2305.13246
  136. Widmann, T., & Wich, M. (2022). Creating and Comparing Dictionary, Word Embedding, and Transformer-Based Models to Measure Discrete Emotions in German Political Text. In: Political Analysis, 31(4), 626-641. https://doi.org/10.1017/pan.2022.15
  137. Yadollahi, A., Shahraki, A. G., & Zaiane, O. R. (2018). Current State of Text Sentiment Analysis from Opinion to Emotion Mining. ACM Computing Surveys, 50(2), 1–33. https://doi.org/10.1145/3057270 [DOI: 10.1145/3057270]
  138. Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv preprint. https://doi.org/10.48550/arXiv.2306.05685
  139. Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., & Yang, D. (2023). Can Large Language Models Transform Computational Social Science? arXiv preprint. https://doi.org/10.48550/arXiv.2305.03514
  140. Zou, Q., Xie, S., Lin, Z., Wu, M., & Ju, Y. (2016). Finding the Best Classification Threshold in Imbalanced Classification. Big Data Research, 5, 2–8. https://doi.org/10.1016/j.bdr.2015.12.001 [DOI: 10.1016/j.bdr.2015.12.001]

MeSH Term

Humans
Data Mining
Emotions
Motivation
Social Media

Word Cloud

Created with Highcharts 10.0.0modelsperformanceinternalstatestextdatamethodslanguagetextscancodingtrainedshortminingonlinesocialbehavioralscientistsinsightsindividuals'manualexpressionsHoweveranalyzedtextualevaluateacrossshowlargemanuallyShortgeneratedindividualsenvironmentsproviderichTrainedcodersreliablyinterpretimposesrestrictionsnumberlimitingabilityextractlarge-scaleseveralautomaticanalysisapproximatinghumancoders'evaluationsfourtasksencompassingmotivesnormsemotionsstancesfindingssuggestcommonlyuseddictionariesalthoughperformingwellidentifyinginfrequentcategoriesgeneratefalsepositivesfrequentlycomparedcodedyieldhighestcasestudiesalsoinstancessimpleralmostequalAdditionallyeffectivenesscutting-edgegenerativelikeGPT-4helpinstructionsso-calledzero-shotclassificationpromisingfalldiscussstrengthsweaknessesvariousexploretrade-offsmodelcomplexitydifferentapplicationsworkinformschallengesassociateddatasetsprovidingbest-practicerecommendationssystematicevaluationtexts:MappingpostsBigLargeMachinelearningNaturalprocessingText

Similar Articles

Cited By