Abstract
This paper presents our submission to Task 2 of the Workshop on Noisy User-generated Text. We explore improving the performance of a pre-trained transformer-based language model fine-tuned for text classification through an ensemble implementation that makes use of corpus level information and a handcrafted feature. We test the effectiveness of including the aforementioned features in accommodating the challenges of a noisy data set centred on a specific subject outside the remit of the pre-training data. We show that inclusion of additional features can improve classification results and achieve a score within 2 points of the top performing team.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020) |
| Place of Publication | Online |
| Publisher | Association for Computational Linguistics |
| Pages | 352-358 |
| Number of pages | 7 |
| DOIs | |
| Publication status | Published - 1 Nov 2020 |
Fingerprint
Dive into the research topics of 'CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets - RoBERTa Ensembles and The Continued Relevance of Handcrafted Features'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS