Collective self-understanding: A linguistic style analysis of naturally occurring text data

Alicia Cork, Richard Everson, Elahe Naserian, Mark Levine, Miriam Koschate-Reis

Research output: Contribution to journalArticlepeer-review

1 Citation (SciVal)
121 Downloads (Pure)

Abstract

Understanding what groups stand for is integral to a diverse array of social processes, ranging from understanding political conflicts to organisational behaviour to promoting public health behaviours. Traditionally, researchers rely on self-report methods such as interviews and surveys to assess groups’ collective self-understandings. Here, we demonstrate the value of using naturally occurring online textual data to map the similarities and differences between real-world groups’ collective self-understandings. We use machine learning algorithms to assess similarities between 15 diverse online groups’ linguistic style, and then use multidimensional scaling to map the groups in two-dimensonal space (N=1,779,098 Reddit comments). We then use agglomerative and k-means clustering techniques to assess how the 15 groups cluster, finding there are four behaviourally distinct group types – vocational, collective action (comprising political and ethnic/religious identities), relational and stigmatised groups, with stigmatised groups having a less distinctive behavioural profile than the other group types. Study 2 is a secondary data analysis where we find strong relationships between the coordinates of each group in multidimensional space and the groups’ values. In Study 3, we demonstrate how this approach can be used to track the development of groups’ collective self-understandings over time. Using transgender Reddit data (N= 1,095,620 comments) as a proof-of-concept, we track the gradual politicisation of the transgender group over the past decade. The automaticity of this methodology renders it advantageous for monitoring multiple online groups simultaneously. This approach has implications for both governmental agencies and social researchers more generally. Future research avenues and applications are discussed.

Original languageEnglish
JournalBehavior Research Methods
DOIs
Publication statusPublished - 28 Nov 2022

Bibliographical note

Funding information not available.

Keywords

  • Group identities
  • Linguistic style analysis
  • Multidimensional scaling
  • Naturally occurring text data

ASJC Scopus subject areas

  • Experimental and Cognitive Psychology
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Psychology (miscellaneous)
  • General Psychology

Fingerprint

Dive into the research topics of 'Collective self-understanding: A linguistic style analysis of naturally occurring text data'. Together they form a unique fingerprint.

Cite this