Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs

Haritz Puerto, Tilek Chubakov, Xiaodan Zhu, Harish Tayyar Madabushi, Iryna Gurevych

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

1 Citation (SciVal)

Abstract

Requiring a large language model (LLM) to generate intermediary reasoning steps, known as Chain of Thought (CoT), has been shown to be an effective way of boosting performance. Previous approaches have focused on generating multiple independent CoTs, combining them through ensembling or other post-hoc strategies to enhance reasoning. In this work, we introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step, which is fundamentally different from prior work that primarily operate on parallel CoT generations. DCoT allows LLMs to gain the ability to perform within-inference refinement of reasoning chains without requiring external feedback. Through a rigorous set of experiments spanning a wide range of tasks that require various reasoning types, we show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales (1.3B to 70B). These improvements are particularly impactful for tasks with a large result state space, such as those involving numeric answers. Our work is also significant because both quantitative analyses and manual evaluations reveal the observed gains stem from the models' ability to refine an initial reasoning chain by generating a second, improved chain within the same inference step, demonstrating previously elusive self-improvement. Our code and data are publicly available.

Original languageEnglish
Title of host publicationProceedings of the Annual Meeting of the Association for Computational Linguistics, 2025
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
PublisherAssociation for Computational Linguistics (ACL)
Pages3789-3808
Number of pages20
ISBN (Electronic)9798891762510
Publication statusPublished - 31 Dec 2025
Event63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria
Duration: 27 Jul 20251 Aug 2025

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Country/TerritoryAustria
CityVienna
Period27/07/251/08/25

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs'. Together they form a unique fingerprint.

Cite this