Abstract
In this study, we developed the first baseline readability model for the Cebuano language. Cebuano is the second most-used native language in the Philippines with about 27.5 million speakers. As the baseline, we extracted traditional or surface-based features, syllable patterns based from Cebuano’s documented orthography, and neural embeddings from the multilingual BERT model. Results show that the use of the first two handcrafted linguistic features obtained the best performance trained on an optimized Random Forest model with approximately 87% across all metrics. The feature sets and algorithm used also is similar to previous results in readability assessment for the Filipino language—showing potential of crosslingual application.
Original language | English |
---|---|
Title of host publication | BEA 2022 - 17th Workshop on Innovative Use of NLP for Building Educational Applications, Proceedings |
Editors | Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anais Tack, Victoria Yaneva, Zheng Yuan, Torsten Zesch |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 27-32 |
Number of pages | 6 |
ISBN (Electronic) | 9781955917834 |
DOIs | |
Publication status | Published - 31 Jul 2022 |
Event | 17th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2022 - Seattle, USA United States Duration: 15 Jul 2022 → … |
Publication series
Name | BEA 2022 - 17th Workshop on Innovative Use of NLP for Building Educational Applications, Proceedings |
---|
Conference
Conference | 17th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2022 |
---|---|
Country/Territory | USA United States |
City | Seattle |
Period | 15/07/22 → … |
Bibliographical note
Funding Information:The authors would like to thank the anonymous reviewers for their valuable feedback. This project is supported by the Google AI Tensorflow Faculty Grant awarded to Joseph Marvin Imperial.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
Funding
The authors would like to thank the anonymous reviewers for their valuable feedback. This project is supported by the Google AI Tensorflow Faculty Grant awarded to Joseph Marvin Imperial.
ASJC Scopus subject areas
- Language and Linguistics
- Software
- Linguistics and Language