Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance

Rabab Alkhalifa, Hsuvas Borkakoty, Romain Deveaud, Alaa El-Ebshihy, Luis Espinosa-Anke, Tobias Fink, Petra Galuščáková, Gabriela Gonzalez-Saez, Lorraine Goeuriot, David Iommi, Maria Liakata, Harish Tayyar Madabushi, Pablo Medina-Alias, Philippe Mulhem, Florina Piroi, Martin Popel, Arkaitz Zubiaga

Research output: Contribution to journalConference articlepeer-review

2 Citations (SciVal)
74 Downloads (Pure)

Abstract

We describe the second edition of the LongEval CLEF 2024 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.

Original languageEnglish
Pages (from-to)2267-2289
Number of pages23
JournalCEUR Workshop Proceedings
Volume3740
Publication statusPublished - 12 Sept 2024
Event25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2024 - Grenoble, France
Duration: 9 Sept 202412 Sept 2024

Keywords

  • Evaluation
  • Information Retrieval
  • Temporal Generalisability
  • Temporal Persistence
  • Text Classification

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance'. Together they form a unique fingerprint.

Cite this