Abstract
Readability assessment is the process of identifying the level of ease or difficulty of a certain piece of text for its intended audience. Approaches have evolved from the use of arithmetic formulas to more complex pattern-recognizing models trained using machine learning algorithms. While using these approaches provide competitive results, limited work is done on analyzing how linguistic variables affect model inference quantitatively. In this work, we dissect machine learning-based readability assessment models in Filipino by performing global and local model interpretation to understand the contributions of varying linguistic features and discuss its implications in the context of the Filipino language. Results show that using a model trained with top features from global interpretation obtained higher performance than the ones using features selected by Spearman correlation. Likewise, we also empirically observed local feature weight boundaries for discriminating reading difficulty at an extremely fine-grained level and their corresponding effects if values are perturbed.
Original language | English |
---|---|
Title of host publication | Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1-10 |
Number of pages | 10 |
Publication status | Published - 7 Nov 2021 |
Event | 35th Pacific Asia Conference on Language, Information and Computation, PACLIC 2021 - Shanghai, China Duration: 5 Nov 2021 → 7 Nov 2021 |
Conference
Conference | 35th Pacific Asia Conference on Language, Information and Computation, PACLIC 2021 |
---|---|
Country/Territory | China |
City | Shanghai |
Period | 5/11/21 → 7/11/21 |
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable feedback and to Dr. Ani Almario of Adarna House for allowing us to use their children’s book dataset for this study. This work is also supported by the DOST National Research Council of the Philippines (NRCP).ASJC Scopus subject areas
- Artificial Intelligence
- Human-Computer Interaction
- Linguistics and Language