Uniform Complexity for Text Generation

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

Abstract

Large language models (LLMs) have shown promising results in a wide array of generative NLP tasks, such as summarization and machine translation. In the context of narrative generation, however, existing models still do not capture factors that contribute to producing consistent text. For instance, it is logical that a piece of text or a story should be uniformly readable throughout and that this form of complexity should be controllable. As such, if the complexity of an input text prompt is rated first-grade reading level in the Flesch Reading Ease test, then the generated text continuing the plot should also be within this range of complexity. With this in mind, we introduce Uniform Complexity for Text Generation (UCTG), a new benchmark test which raises the challenge of making generative models observe uniform linguistic properties with respect to prompts. We experiment with over 150+ linguistically and cognitively motivated features for evaluating text complexity in humans and generative models. From our results, we find that models such as GPT-2 struggle to preserve the complexity of input prompts used in its generations, even if finetuned with professionally written texts.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationEMNLP 2023
PublisherAssociation for Computational Linguistics (ACL)
Pages12025-12046
Number of pages22
ISBN (Electronic)9798891760615
Publication statusPublished - 10 Dec 2023
Event2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Singapore, Singapore
Duration: 6 Dec 202310 Dec 2023

Publication series

NameFindings of the Association for Computational Linguistics: EMNLP 2023

Conference

Conference2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Country/TerritorySingapore
CitySingapore
Period6/12/2310/12/23

Funding

We thank the anonymous reviewers for their constructive feedback and the ACs, SACs, and PCs for their appreciation of this work. We also thank Alexandra DeLucia and Ekaterina Kochmar for their valuable feedback on the initial version of this work and Mark Townsend for the assistance with configuring the experiments with the Hex GPU cloud of the Department of Computer Science at the University of Bath. JMI is supported by the UKRI Centre for Doctoral Training in Accountable, Responsible and Transparent AI (ART-AI) [EP/S023437/1] of the University of Bath, the NU Research Faculty Program (Project ID: 2021F-2T- 01-MLA-CCIT), and the Study Grant Program of National University Philippines.

FundersFunder number
NU Research Faculty Program2021F-2T- 01-MLA-CCIT
National University, Philippines
UK Research and InnovationEP/S023437/1
University of Bath

    ASJC Scopus subject areas

    • Computational Theory and Mathematics
    • Computer Science Applications
    • Information Systems
    • Language and Linguistics
    • Linguistics and Language

    Fingerprint

    Dive into the research topics of 'Uniform Complexity for Text Generation'. Together they form a unique fingerprint.

    Cite this