TY - GEN
T1 - BERT Embeddings for Automatic Readability Assessment
AU - Imperial, Joseph Marvin
N1 - Publisher Copyright:
© 2021 Incoma Ltd. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets-obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.
AB - Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets-obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.
UR - http://www.scopus.com/inward/record.url?scp=85123596287&partnerID=8YFLogxK
U2 - 10.26615/978-954-452-072-4_069
DO - 10.26615/978-954-452-072-4_069
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85123596287
T3 - International Conference Recent Advances in Natural Language Processing, RANLP
SP - 611
EP - 618
BT - International Conference Recent Advances in Natural Language Processing, RANLP 2021
A2 - Angelova, Galia
A2 - Kunilovskaya, Maria
A2 - Mitkov, Ruslan
A2 - Nikolova-Koleva, Ivelina
PB - Incoma Ltd
T2 - International Conference on Recent Advances in Natural Language Processing: Deep Learning for Natural Language Processing Methods and Applications, RANLP 2021
Y2 - 1 September 2021 through 3 September 2021
ER -