Abstract
Proper identification of the difficulty level of materials prescribed as required readings in an educational setting is key towards effective learning in children. Educators and publishers have relied on readability formulas in predicting text readability. While these formulas abound in the English language, limited work has been done on automatic readability assessment for the Filipino language. In this study, we build upon the previous works using traditional (TRAD) and lexical (LEX) linguistic features by incorporating language model (LM) features for possible improvement in identifying readability levels of Filipino storybooks. Results showed that combining LM predictors to TRAD and LEX, forming a hybrid feature set, increased the performances of readability models trained using Logistic Regression and Support Vector Machines by up to approx 25% - 32%. From the results of performing feature selection using Spearman correlation and Information Gain on the feature set, we found out that traditional, numeric based features such as word counts and polysyllable word counts are still contributive towards accurate identification of readability levels in Filipino although not by itself. Future directions of the study include extracting more diverse feature sets such as syntactic and morphological predictors.
Original language | English |
---|---|
Title of host publication | 2020 International Conference on Asian Language Processing, IALP 2020 |
Editors | Yanfeng Lu, Minghui Dong, Lay-Ki Soon, Keng Hoon Gan |
Publisher | IEEE |
Pages | 175-180 |
Number of pages | 6 |
ISBN (Electronic) | 9781728176895 |
DOIs | |
Publication status | Published - 4 Dec 2020 |
Event | 2020 International Conference on Asian Language Processing, IALP 2020 - Kuala Lumpur, Malaysia Duration: 4 Dec 2020 → 6 Dec 2020 |
Publication series
Name | 2020 International Conference on Asian Language Processing, IALP 2020 |
---|
Conference
Conference | 2020 International Conference on Asian Language Processing, IALP 2020 |
---|---|
Country/Territory | Malaysia |
City | Kuala Lumpur |
Period | 4/12/20 → 6/12/20 |
Bibliographical note
Publisher Copyright:© 2020 IEEE.
Keywords
- filipino
- linguistic features
- readability
ASJC Scopus subject areas
- Language and Linguistics
- Artificial Intelligence
- Signal Processing
- Linguistics and Language