A Baseline Readability Model for Cebuano

Lloyd Lois Antonie Reyes, Michael Antonio Ibañez, Ranz Sapinit, Mohammed Hussien, Joseph Marvin Imperial

Research output: Chapter in Book/Report/Conference proceedingChapter in a published conference proceeding

Abstract

In this study, we developed the first baseline readability model for the Cebuano language. Cebuano is the second most-used native language in the Philippines with about 27.5 million speakers. As the baseline, we extracted traditional or surface-based features, syllable patterns based from Cebuano’s documented orthography, and neural embeddings from the multilingual BERT model. Results show that the use of the first two handcrafted linguistic features obtained the best performance trained on an optimized Random Forest model with approximately 87% across all metrics. The feature sets and algorithm used also is similar to previous results in readability assessment for the Filipino language—showing potential of crosslingual application.

Original languageEnglish
Title of host publicationBEA 2022 - 17th Workshop on Innovative Use of NLP for Building Educational Applications, Proceedings
EditorsEkaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anais Tack, Victoria Yaneva, Zheng Yuan, Torsten Zesch
PublisherAssociation for Computational Linguistics (ACL)
Pages27-32
Number of pages6
ISBN (Electronic)9781955917834
Publication statusPublished - 2022
Event17th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2022 - Seattle, USA United States
Duration: 15 Jul 2022 → …

Publication series

NameBEA 2022 - 17th Workshop on Innovative Use of NLP for Building Educational Applications, Proceedings

Conference

Conference17th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2022
Country/TerritoryUSA United States
CitySeattle
Period15/07/22 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'A Baseline Readability Model for Cebuano'. Together they form a unique fingerprint.

Cite this