Abstract
Readability metrics and standards such as Flesch Kincaid Grade Level (FKGL) and the Common European Framework of Reference for Languages (CEFR) exist to guide teachers and educators to properly assess the complexity of educational materials before administering them for classroom use. In this study, we select a diverse set of open and closed-source instruction-tuned language models and investigate their performances in writing story completions and simplifying narratives$-$tasks that teachers perform$-$using standard-guided prompts controlling text readability. Our extensive findings provide empirical proof of how globally recognized models like ChatGPT may be considered less effective and may require more refined prompts for these generative tasks compared to other open-sourced models such as BLOOMZ and FlanT5$-$which have shown promising results.
Original language | English |
---|---|
Title of host publication | Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM) |
Place of Publication | Singapore |
Publisher | Association for Computational Linguistics |
Pages | 205–223 |
Number of pages | 19 |
Publication status | Published - 31 Dec 2023 |
Acknowledgements
We thank the anonymous reviewers for their constructive feedback of this work. We also thank Mark Townsend for the assistance with configuring the experiments with the Hex GPU cloud of the Department of Computer Science at the University of Bath.Funding
JMI is supported by the UKRI Centre for Doctoral Training in Accountable, Responsible and Transparent AI (ART-AI) [EP/S023437/1] of the University of Bath and the Study Grant Program of National University Philippines.
Funders | Funder number |
---|---|
UKRI Centre for Doctoral Training in Accountable Responsible and Transparent Artificial Intelligence | EP/S023437/1 |
National University, Philippines |
Keywords
- cs.CL