Abstract
Pre-trained transformer-based language models (PLMs) have revolutionised text classification tasks, but their performance tends to deteriorate on data distant in time from the training dataset. Continual supervised re-training help address this issue but it is limited by the availability of newly labelled samples. This paper explores the longitudinal generalisation abilities of large generative PLMs, such as GPT-3 and T5, and smaller encoder-only alternatives for sentiment analysis in social media. We investigate the impact of time-related variations in data, model size, and fine-tuning on the classifiers’ performance. Through competitive evaluation in the CLEF-2023 LongEval Task 2, we compare results from fine-tuning, few-shot learning, and zero-shot learning. Our analysis reveals the superior performance of large generative models over the benchmark RoBERTa and highlights the benefits of limited exposure to training data in achieving robust predictions on temporally distant test sets. The findings contribute to understanding how to build more temporally robust transformer-based text classifiers, reducing the need for continuous re-training with annotated data.
Original language | English |
---|---|
Pages (from-to) | 2458-2468 |
Number of pages | 11 |
Journal | CEUR Workshop Proceedings |
Volume | 3497 |
Publication status | Published - 21 Sept 2023 |
Event | 24th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF-WN 2023 - Thessaloniki, Greece Duration: 18 Sept 2023 → 21 Sept 2023 |
Bibliographical note
Funding Information:This work was supported by UKRI, under grant number EP/S023437/1.
Funding
This work was supported by UKRI, under grant number EP/S023437/1.
Keywords
- pre-trained language representations
- social media analysis
- temporal robustness
- text classification
- text generation
ASJC Scopus subject areas
- General Computer Science