Exploring Hybrid Linguistic Feature Sets to Measure Filipino Text Readability

Research output: Chapter in Book/Report/Conference proceedingChapter in a published conference proceeding

5 Citations (SciVal)

Abstract

Proper identification of the difficulty level of materials prescribed as required readings in an educational setting is key towards effective learning in children. Educators and publishers have relied on readability formulas in predicting text readability. While these formulas abound in the English language, limited work has been done on automatic readability assessment for the Filipino language. In this study, we build upon the previous works using traditional (TRAD) and lexical (LEX) linguistic features by incorporating language model (LM) features for possible improvement in identifying readability levels of Filipino storybooks. Results showed that combining LM predictors to TRAD and LEX, forming a hybrid feature set, increased the performances of readability models trained using Logistic Regression and Support Vector Machines by up to approx 25% - 32%. From the results of performing feature selection using Spearman correlation and Information Gain on the feature set, we found out that traditional, numeric based features such as word counts and polysyllable word counts are still contributive towards accurate identification of readability levels in Filipino although not by itself. Future directions of the study include extracting more diverse feature sets such as syntactic and morphological predictors.

Original languageEnglish
Title of host publication2020 International Conference on Asian Language Processing, IALP 2020
EditorsYanfeng Lu, Minghui Dong, Lay-Ki Soon, Keng Hoon Gan
PublisherIEEE
Pages175-180
Number of pages6
ISBN (Electronic)9781728176895
DOIs
Publication statusPublished - 4 Dec 2020
Event2020 International Conference on Asian Language Processing, IALP 2020 - Kuala Lumpur, Malaysia
Duration: 4 Dec 20206 Dec 2020

Publication series

Name2020 International Conference on Asian Language Processing, IALP 2020

Conference

Conference2020 International Conference on Asian Language Processing, IALP 2020
Country/TerritoryMalaysia
CityKuala Lumpur
Period4/12/206/12/20

Keywords

  • filipino
  • linguistic features
  • readability

ASJC Scopus subject areas

  • Language and Linguistics
  • Artificial Intelligence
  • Signal Processing
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Exploring Hybrid Linguistic Feature Sets to Measure Filipino Text Readability'. Together they form a unique fingerprint.

Cite this