Abstract
Reading is an essential part of children's learning. Identifying the proper readability level of reading materials will ensure effective comprehension. We present our efforts to develop a baseline model for automatically identifying the readability of children's and young adult's books written in Filipino using machine learning algorithms. For this study, we processed 258 picture books published by Adarna House Inc. In contrast to old readability formulas relying on static attributes like number of words, sentences, syllables, etc., other textual features were explored. Count vectors, Term FrequencyInverse Document Frequency (TF-IDF), n-grams, and character-level n-grams were extracted to train models using three major machine learning algorithms-Multinomial Naïve-Bayes, Random Forest, and K-Nearest Neighbors. A combination of K-Nearest Neighbors and Random Forest via voting-based classification mechanism resulted with the best performing model with a high average training accuracy and validation accuracy of 0.822 and 0.74 respectively. Analysis of the top 10 most useful features for each algorithm show that they share common similarity in identifying readability levels-The use of Filipino stop words. Performance of other classifiers and features were also explored.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2019 International Conference on Asian Language Processing, IALP 2019 |
Editors | Man Lan, Yuanbin Wu, Minghui Dong, Yanfeng Lu, Yan Yang |
Publisher | IEEE |
Pages | 413-418 |
Number of pages | 6 |
ISBN (Electronic) | 9781728150147 |
DOIs | |
Publication status | Published - Nov 2019 |
Event | 23rd International Conference on Asian Language Processing, IALP 2019 - Shanghai, China Duration: 15 Nov 2019 → 17 Nov 2019 |
Publication series
Name | Proceedings of the 2019 International Conference on Asian Language Processing, IALP 2019 |
---|
Conference
Conference | 23rd International Conference on Asian Language Processing, IALP 2019 |
---|---|
Country/Territory | China |
City | Shanghai |
Period | 15/11/19 → 17/11/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
Keywords
- classification
- Filipino
- machine learning
- readability
- storybook
ASJC Scopus subject areas
- Artificial Intelligence
- Linguistics and Language
- Computer Vision and Pattern Recognition
- Signal Processing