Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

21 Downloads (Pure)

Abstract

Readability metrics and standards such as Flesch Kincaid Grade Level (FKGL) and the Common European Framework of Reference for Languages (CEFR) exist to guide teachers and educators to properly assess the complexity of educational materials before administering them for classroom use. In this study, we select a diverse set of open and closed-source instruction-tuned language models and investigate their performances in writing story completions and simplifying narratives$-$tasks that teachers perform$-$using standard-guided prompts controlling text readability. Our extensive findings provide empirical proof of how globally recognized models like ChatGPT may be considered less effective and may require more refined prompts for these generative tasks compared to other open-sourced models such as BLOOMZ and FlanT5$-$which have shown promising results.
Original languageEnglish
Title of host publicationProceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Place of PublicationSingapore
PublisherAssociation for Computational Linguistics
Pages205–223
Number of pages19
Publication statusPublished - 31 Dec 2023

Acknowledgements

We thank the anonymous reviewers for their constructive feedback of this work. We also thank Mark Townsend for the assistance with configuring the experiments with the Hex GPU cloud of the Department of Computer Science at the University of Bath.

Funding

JMI is supported by the UKRI Centre for Doctoral Training in Accountable, Responsible and Transparent AI (ART-AI) [EP/S023437/1] of the University of Bath and the Study Grant Program of National University Philippines.

FundersFunder number
UKRI Centre for Doctoral Training in Accountable Responsible and Transparent Artificial IntelligenceEP/S023437/1
National University, Philippines

    Keywords

    • cs.CL

    Fingerprint

    Dive into the research topics of 'Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models'. Together they form a unique fingerprint.

    Cite this