TY - GEN
T1 - A Baseline Readability Model for Cebuano
AU - Antonie Reyes, Lloyd Lois
AU - Ibañez, Michael Antonio
AU - Sapinit, Ranz
AU - Hussien, Mohammed
AU - Imperial, Joseph Marvin
N1 - Funding Information:
The authors would like to thank the anonymous reviewers for their valuable feedback. This project is supported by the Google AI Tensorflow Faculty Grant awarded to Joseph Marvin Imperial.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - In this study, we developed the first baseline readability model for the Cebuano language. Cebuano is the second most-used native language in the Philippines with about 27.5 million speakers. As the baseline, we extracted traditional or surface-based features, syllable patterns based from Cebuano’s documented orthography, and neural embeddings from the multilingual BERT model. Results show that the use of the first two handcrafted linguistic features obtained the best performance trained on an optimized Random Forest model with approximately 87% across all metrics. The feature sets and algorithm used also is similar to previous results in readability assessment for the Filipino language—showing potential of crosslingual application.
AB - In this study, we developed the first baseline readability model for the Cebuano language. Cebuano is the second most-used native language in the Philippines with about 27.5 million speakers. As the baseline, we extracted traditional or surface-based features, syllable patterns based from Cebuano’s documented orthography, and neural embeddings from the multilingual BERT model. Results show that the use of the first two handcrafted linguistic features obtained the best performance trained on an optimized Random Forest model with approximately 87% across all metrics. The feature sets and algorithm used also is similar to previous results in readability assessment for the Filipino language—showing potential of crosslingual application.
UR - http://www.scopus.com/inward/record.url?scp=85138351520&partnerID=8YFLogxK
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85138351520
T3 - BEA 2022 - 17th Workshop on Innovative Use of NLP for Building Educational Applications, Proceedings
SP - 27
EP - 32
BT - BEA 2022 - 17th Workshop on Innovative Use of NLP for Building Educational Applications, Proceedings
A2 - Kochmar, Ekaterina
A2 - Burstein, Jill
A2 - Horbach, Andrea
A2 - Laarmann-Quante, Ronja
A2 - Madnani, Nitin
A2 - Tack, Anais
A2 - Yaneva, Victoria
A2 - Yuan, Zheng
A2 - Zesch, Torsten
PB - Association for Computational Linguistics (ACL)
T2 - 17th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2022
Y2 - 15 July 2022
ER -