Developing a machine learning-based grade level classifier for Filipino children's literature

Joseph Marvin Imperial, Rachel Edita Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almaroi

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

7 Citations (SciVal)

Abstract

Reading is an essential part of children's learning. Identifying the proper readability level of reading materials will ensure effective comprehension. We present our efforts to develop a baseline model for automatically identifying the readability of children's and young adult's books written in Filipino using machine learning algorithms. For this study, we processed 258 picture books published by Adarna House Inc. In contrast to old readability formulas relying on static attributes like number of words, sentences, syllables, etc., other textual features were explored. Count vectors, Term FrequencyInverse Document Frequency (TF-IDF), n-grams, and character-level n-grams were extracted to train models using three major machine learning algorithms-Multinomial Naïve-Bayes, Random Forest, and K-Nearest Neighbors. A combination of K-Nearest Neighbors and Random Forest via voting-based classification mechanism resulted with the best performing model with a high average training accuracy and validation accuracy of 0.822 and 0.74 respectively. Analysis of the top 10 most useful features for each algorithm show that they share common similarity in identifying readability levels-The use of Filipino stop words. Performance of other classifiers and features were also explored.

Original languageEnglish
Title of host publicationProceedings of the 2019 International Conference on Asian Language Processing, IALP 2019
EditorsMan Lan, Yuanbin Wu, Minghui Dong, Yanfeng Lu, Yan Yang
PublisherIEEE
Pages413-418
Number of pages6
ISBN (Electronic)9781728150147
DOIs
Publication statusPublished - Nov 2019
Event23rd International Conference on Asian Language Processing, IALP 2019 - Shanghai, China
Duration: 15 Nov 201917 Nov 2019

Publication series

NameProceedings of the 2019 International Conference on Asian Language Processing, IALP 2019

Conference

Conference23rd International Conference on Asian Language Processing, IALP 2019
Country/TerritoryChina
CityShanghai
Period15/11/1917/11/19

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Keywords

  • classification
  • Filipino
  • machine learning
  • readability
  • storybook

ASJC Scopus subject areas

  • Artificial Intelligence
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Signal Processing

Fingerprint

Dive into the research topics of 'Developing a machine learning-based grade level classifier for Filipino children's literature'. Together they form a unique fingerprint.

Cite this