Abstract

Machine-assisted approaches for free-text analysis are rising in popularity, owing to a growing need to rapidly analyse large volumes of qualitative data. In both research and policy settings, these approaches have promise in providing timely insights into public perceptions and enabling policymakers to understand their community’s needs. However, current approaches still require expert human interpretation – posing a financial and practical barrier for those outside of academia. For the first time, we propose and validate the Deep Computational Text Analyser (DECOTA) - a novel Machine Learning methodology that automatically analyses large free-text datasets and outputs concise themes. Building on Structural Topic Modelling (STM) approaches, we used two fine-tuned Large Language Models (LLMs) and sentence transformers to automatically derive ‘codes’ and their corresponding ‘themes’, as in Inductive Thematic Analysis. To automate the process, we designed and validated a novel algorithm to choose the optimal number of ‘topics’ following STM. This approach automatically derives key codes and themes from free-text data, the prevalence of each code, and how prevalence varies with covariates such as age and gender. Each code is accompanied by three representative quotes. Four datasets previously analysed using Thematic Analysis were triangulated with DECOTA’s codes and themes. We found that DECOTA is approximately 378 times faster and 1920 times cheaper than human coding, and consistently yields codes in agreement with or complementary to human coding (averaging 91.6% for codes, and 90% for themes). The implications for evidence-based policy development, public engagement with policymaking, and the development of psychometric measures are discussed.
Original languageEnglish
PublisherOpen Science Framework (OSF)
DOIs
Publication statusUnpublished - 24 Jul 2024

Fingerprint

Dive into the research topics of 'The Use of Large Language Models for Qualitative Research: DECOTA'. Together they form a unique fingerprint.

Cite this