UoB at ProfNER 2021: Data Augmentation for Classification Using Machine Translation

Frances Adriana Laureano De Leon, Harish Tayyar Madabushi, Mark Lee

Research output: Chapter in Book/Report/Conference proceedingChapter in a published conference proceeding

Abstract

This paper describes the participation of the UoB-NLP team in the ProfNER-ST shared subtask 7a. The task was aimed at detecting the mention of professions in social media text. Our team experimented with two methods of improving the performance of pre-trained models: Specifically, we experimented with data augmentation through translation and the merging of multiple language inputs to meet the objective of the task. While the best performing model on the test data consisted of mBERT fine-tuned on augmented data using back-translation, the improvement is minor possibly because multi-lingual pre-trained models such as mBERT already have access to the kind of information provided through back-translation and bilingual data.
Original languageEnglish
Title of host publicationProceedings of the Sixth Social Media Mining for Health (SMM4H) Workshop and Shared Task
Place of PublicationMexico City, Mexico
PublisherAssociation for Computational Linguistics
Pages115-117
Number of pages3
DOIs
Publication statusPublished - 1 Jun 2021

Fingerprint

Dive into the research topics of 'UoB at ProfNER 2021: Data Augmentation for Classification Using Machine Translation'. Together they form a unique fingerprint.

Cite this