Sentiment analysis of COP9-related tweets: a comparative study of pre-trained models and traditional techniques

Sherif Elmitwalli, John Mehegan

Research output: Contribution to journalArticlepeer-review

3 Citations (SciVal)

Abstract

Introduction: Sentiment analysis has become a crucial area of research in natural language processing in recent years. The study aims to compare the performance of various sentiment analysis techniques, including lexicon-based, machine learning, Bi-LSTM, BERT, and GPT-3 approaches, using two commonly used datasets, IMDB reviews and Sentiment140. The objective is to identify the best-performing technique for an exemplar dataset, tweets associated with the WHO Framework Convention on Tobacco Control Ninth Conference of the Parties in 2021 (COP9).

Methods: A two-stage evaluation was conducted. In the first stage, various techniques were compared on standard sentiment analysis datasets using standard evaluation metrics such as accuracy, F1-score, and precision. In the second stage, the best-performing techniques from the first stage were applied to partially annotated COP9 conference-related tweets.

Results: In the first stage, BERT achieved the highest F1-scores (0.9380 for IMDB and 0.8114 for Sentiment 140), followed by GPT-3 (0.9119 and 0.7913) and Bi-LSTM (0.8971 and 0.7778). In the second stage, GPT-3 performed the best for sentiment analysis on partially annotated COP9 conference-related tweets, with an F1-score of 0.8812.

Discussion: The study demonstrates the effectiveness of pre-trained models like BERT and GPT-3 for sentiment analysis tasks, outperforming traditional techniques on standard datasets. Moreover, the better performance of GPT-3 on the partially annotated COP9 tweets highlights its ability to generalize well to domain-specific data with limited annotations. This provides researchers and practitioners with a viable option of using pre-trained models for sentiment analysis in scenarios with limited or no annotated data across different domains.
Original languageEnglish
Article number1357926
Number of pages18
JournalFrontiers in Big Data
Volume7
Early online date20 Mar 2024
DOIs
Publication statusPublished - 20 Mar 2024

Data Availability Statement

The data presented in the study are deposited in the Harvard Dataverse repository, accession number https://doi.org/10.7910/DVN/UILQHY.

Keywords

  • BERT
  • Bi-LSTM
  • COP9
  • GPT-3
  • LLMS
  • lexicon-based
  • sentiment analysis

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Artificial Intelligence
  • Information Systems

Fingerprint

Dive into the research topics of 'Sentiment analysis of COP9-related tweets: a comparative study of pre-trained models and traditional techniques'. Together they form a unique fingerprint.

Cite this