Regional personality assessment through social media language.
Salvatore Giorgi, Khoa Le Nguyen, Johannes C Eichstaedt, Margaret L Kern, David B Yaden, Michal Kosinski, Martin E P Seligman, Lyle H Ungar, H Andrew Schwartz, Gregory Park
Author Information
Salvatore Giorgi: Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA. ORCID
Khoa Le Nguyen: Department Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. ORCID
Johannes C Eichstaedt: Department of Psychology, Institute for Human-Centered A.I., Stanford University, Stanford, California, USA. ORCID
Margaret L Kern: Melbourne Graduate School of Education, University of Melbourne, Melbourne, Victoria, Australia. ORCID
David B Yaden: Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA. ORCID
Michal Kosinski: Graduate School of Business, Stanford University, Stanford, California, USA. ORCID
Martin E P Seligman: Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Lyle H Ungar: Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA. ORCID
H Andrew Schwartz: Department of Computer Science, Stony Brook University, Stony Brook, New York, USA. ORCID
Gregory Park: Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, USA. ORCID
OBJECTIVE: We explore the personality of counties as assessed through linguistic patterns on social media. Such studies were previously limited by the cost and feasibility of large-scale surveys; however, language-based computational models applied to large social media datasets now allow for large-scale personality assessment. METHOD: We applied a language-based assessment of the five factor model of personality to 6,064,267 U.S. Twitter users. We aggregated the Twitter-based personality scores to 2,041 counties and compared to political, economic, social, and health outcomes measured through surveys and by government agencies. RESULTS: There was significant personality variation across counties. Openness to experience was higher on the coasts, conscientiousness was uniformly spread, extraversion was higher in southern states, agreeableness was higher in western states, and emotional stability was highest in the south. Across 13 outcomes, language-based personality estimates replicated patterns that have been observed in individual-level and geographic studies. This includes higher Republican vote share in less agreeable counties and increased life satisfaction in more conscientious counties. CONCLUSIONS: Results suggest that regions vary in their personality and that these differences can be studied through computational linguistic analysis of social media. Furthermore, these methods may be used to explore other psychological constructs across geographies.