Abstract

Machine-assisted approaches for free-text analysis are rising in popularity, owing to a growing need to rapidly analyze large volumes of qualitative data. In both research and policy settings, these approaches have promise in providing timely insights into public perceptions and enabling policymakers to understand their community’s needs. However, current approaches still require expert human interpretation—posing a financial and practical barrier for those outside of academia. For the first time, we propose and validate the Deep Computational Text Analyser (DECOTA)—a novel machine learning methodology that automatically analyzes large free-text data sets and outputs concise themes. Building on structural topic modeling approaches, we used two fine-tuned large language models and sentence transformers to automatically derive “codes” and their corresponding “themes”, as in inductive thematic analysis. To fully automate the process, we designed and validated a novel algorithm to choose the optimal number of “topics” for the structural topic modeling. DECOTA outputs key codes and themes, their prevalence, and how prevalence varies across covariates such as age and gender. Each code is accompanied by three representative quotes. Four data sets previously analyzed using thematic analysis were triangulated with DECOTA’s codes and themes. We found that DECOTA is approximately 378 times faster and 1,920 times cheaper than human coding and consistently yields codes in agreement with or complementary to human coding (averaging 91.6% for codes and 90% for themes). The implications for evidence-based policy development, public engagement with policymaking, and psychometric measure development are discussed. Computational approaches are increasingly being used to quickly process large volumes of free-text data. These approaches hold promise in helping academics study public perceptions, and policymakers understand their community’s needs. However, current methods still require expert human interpretation, which can be costly and impractical. In this article, we developed the Deep Computational Text Analyser (DECOTA), a novel machine learning methodology designed to automatically analyze large free-text data sets to produce concise “themes” within the data. DECOTA uses several custom-trained models to detect themes and subthemes within the data, as a human may do when categorizing free-text responses. Our approach gives information about how common each subtheme and theme is, how common they are among different demographic groups, and offers example quotes. We compared how similar DECOTA’s analysis was to human coders, using four example free-text data sets. DECOTA’s outputs were highly consistent with human analyses, detecting 91.6% of all human subthemes and 90% of the humans’ themes. We noted that DECOTA was approximately 378 times faster and 1,920 times cheaper than human analysis. The potential uses of this methodology for policymakers and academics are discussed.

Original languageEnglish
JournalPsychological Methods
Early online date7 Apr 2025
DOIs
Publication statusE-pub ahead of print - 7 Apr 2025

Funding

Lois Player and Ryan Hughes are supported by a scholarship from the Engineering and Physical Sciences Research Council (EPSRC) Centre for Doctoral Training in Advanced Automotive Propulsion Systems (AAPS), under the project EP/S023364/1. We would like to thank Lauren Towler from the University of Southampton for her time discussing the practicalities of thematic analysis. The views expressed in this publication are those of the authors and do not reflect the official position of the European Commission. All code and data associated with this article have been shared on the Open Science Framework (OSF; https://osf.io/5jste/), and a preprint on the OSF\u2019s repository PsyArXiv (available from https://osf.io/preprints/psyarxiv/t5gbv_v1). The methodology and some results were presented at select conferences and research groups (a University of Bath internal conference, the British Environmental Psychology Conference 2024, two research groups at Otto-von-Guericke-University Magdeburg, and a research group at the Technical University of Berlin).

FundersFunder number
Otto von Guericke University Magdeburg
European Commission
University of Southampton
Engineering and Physical Sciences Research CouncilEP/S023364/1

Keywords

  • free-text analysis
  • large language models
  • machine learning
  • natural language processing
  • topic modeling

ASJC Scopus subject areas

  • Psychology (miscellaneous)

Fingerprint

Dive into the research topics of 'The Use of Large Language Models for Qualitative Research: The Deep Computational Text Analyser (DECOTA)'. Together they form a unique fingerprint.

Cite this