Abstract
Readability assessment is the process of identifying the level of ease or difficulty of a certain piece of text for its intended audience. Approaches have evolved from the use of arithmetic formulas to more complex pattern-recognizing models trained using machine learning algorithms. While using these approaches provide competitive results, limited work is done on analyzing how linguistic variables affect model inference quantitatively. In this work, we dissect machine learning-based readability assessment models in Filipino by performing global and local model interpretation to understand the contributions of varying linguistic features and discuss its implications in the context of the Filipino language. Results show that using a model trained with top features from global interpretation obtained higher performance than the ones using features selected by Spearman correlation. Likewise, we also empirically observed local feature weight boundaries for discriminating reading difficulty at an extremely fine-grained level and their corresponding effects if values are perturbed.
Original language | English |
---|---|
Place of Publication | Seattle, USA |
Publisher | Association for Computational Linguistics (ACL) |
Volume | Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) |
Publication status | Published - 1 Oct 2021 |
Bibliographical note
Accepted for oral presentation at PACLIC 2021Keywords
- cs.CL
- cs.LG