BERT Embeddings for Automatic Readability Assessment

Research output: Chapter in Book/Report/Conference proceedingChapter in a published conference proceeding

2 Citations (SciVal)

Abstract

Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets-obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.

Original languageEnglish
Title of host publicationInternational Conference Recent Advances in Natural Language Processing, RANLP 2021
Subtitle of host publicationDeep Learning for Natural Language Processing Methods and Applications - Proceedings
EditorsGalia Angelova, Maria Kunilovskaya, Ruslan Mitkov, Ivelina Nikolova-Koleva
PublisherIncoma Ltd
Pages611-618
Number of pages8
ISBN (Electronic)9789544520724
DOIs
Publication statusPublished - 2021
EventInternational Conference on Recent Advances in Natural Language Processing: Deep Learning for Natural Language Processing Methods and Applications, RANLP 2021 - Virtual, Online
Duration: 1 Sep 20213 Sep 2021

Publication series

NameInternational Conference Recent Advances in Natural Language Processing, RANLP
ISSN (Print)1313-8502

Conference

ConferenceInternational Conference on Recent Advances in Natural Language Processing: Deep Learning for Natural Language Processing Methods and Applications, RANLP 2021
CityVirtual, Online
Period1/09/213/09/21

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'BERT Embeddings for Automatic Readability Assessment'. Together they form a unique fingerprint.

Cite this