Modeling virtual organizations with Latent Dirichlet Allocation: a case for natural language processing.

Alexander Gross, Dhiraj Murthy
Author Information
  1. Alexander Gross: University of Maine, United States.
  2. Dhiraj Murthy: Goldsmiths College, University of London, United Kingdom. Electronic address: d.murthy@gold.ac.uk.

Abstract

This paper explores a variety of methods for applying the Latent Dirichlet Allocation (LDA) automated topic modeling algorithm to the modeling of the structure and behavior of virtual organizations found within modern social media and social networking environments. As the field of Big Data reveals, an increase in the scale of social data available presents new challenges which are not tackled by merely scaling up hardware and software. Rather, they necessitate new methods and, indeed, new areas of expertise. Natural language processing provides one such method. This paper applies LDA to the study of scientific virtual organizations whose members employ social technologies. Because of the vast data footprint in these virtual platforms, we found that natural language processing was needed to 'unlock' and render visible latent, previously unseen conversational connections across large textual corpora (spanning profiles, discussion threads, forums, and other social media incarnations). We introduce variants of LDA and ultimately make the argument that natural language processing is a critical interdisciplinary methodology to make better sense of social 'Big Data' and we were able to successfully model nested discussion topics from forums and blog posts using LDA. Importantly, we found that LDA can move us beyond the state-of-the-art in conventional Social Network Analysis techniques.

Keywords

MeSH Term

Algorithms
Humans
Models, Theoretical
Natural Language Processing
Software

Word Cloud

Created with Highcharts 10.0.0socialLDAlanguageprocessingvirtualorganizationsLatentDirichletfoundmedianewnaturalpapermethodsAllocationmodelingBigDatadataNaturaldiscussionforumsmakeSocialexploresvarietyapplyingautomatedtopicalgorithmstructurebehaviorwithinmodernnetworkingenvironmentsfieldrevealsincreasescaleavailablepresentschallengestackledmerelyscalinghardwaresoftwareRathernecessitateindeedareasexpertiseprovidesonemethodappliesstudyscientificwhosemembersemploytechnologiesvastfootprintplatformsneeded'unlock'rendervisiblelatentpreviouslyunseenconversationalconnectionsacrosslargetextualcorporaspanningprofilesthreadsincarnationsintroducevariantsultimatelyargumentcriticalinterdisciplinarymethodologybettersense'BigData'ablesuccessfullymodelnestedtopicsblogpostsusingImportantlycanmoveusbeyondstate-of-the-artconventionalNetworkAnalysistechniquesModelingAllocation:caseVirtual

Similar Articles

Cited By (15)