Are Emergent Abilities in Large Language Models just In-Context Learning?

Sheng Lu, Irina Bigoulaeva, Rachneet Singh Sachdeva, Harish Tayyar Madabushi, Iryna Gurevych

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

7 Citations (SciVal)

Abstract

Large language models, comprising billions of parameters and pre-trained on extensive web-scale corpora, have been claimed to acquire certain capabilities without having been specifically trained on them. These capabilities, referred to as “emergent abilities,” have been a driving force in discussions regarding the potentials and risks of language models. A key challenge in evaluating emergent abilities is that they are confounded by model competencies that arise through alternative prompting techniques, including in-context learning, which is the ability of models to complete a task based on a few examples. We present a novel theory that explains emergent abilities, taking into account their potential confounding factors, and rigorously substantiate this theory through over 1000 experiments. Our findings suggest that purported emergent abilities are not truly emergent, but result from a combination of in-context learning, model memory, and linguistic knowledge. Our work is a foundational step in explaining language model performance, providing a template for their efficient use and clarifying the paradox of their ability to excel in some instances while faltering in others. Thus, we demonstrate that their capabilities should not be overestimated.

Original languageEnglish
Title of host publicationProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics
EditorsLun-Wei Ku, Andre F. T. Martins, Vivek Srikumar
Place of PublicationBangkok, Thailand
PublisherAssociation for Computational Linguistics
Pages5098–5139
Number of pages42
Volume1
EditionLong Papers
ISBN (Electronic)9798891760943
ISBN (Print)9798891760943
DOIs
Publication statusPublished - 31 Aug 2024

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Funding

This work has been funded by the LOEWE Distinguished Chair \u201CUbiquitous Knowledge Processing\u201D, LOEWE initiative, Hesse, Germany (Grant Number: LOEWE/4a//519/05/00.002(0002)/81). This research work has been funded by the German Federal Ministry of Education and Research and the Hessian Ministry of Higher Education, Research, Science and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE. We would also like to thank the Early Career Research grant from the University of Bath. This work would not have been possible without the generous grant from the Microsoft Accelerate Foundation Models Academic Research fund, which allowed us to experiment extensively with the Azure OpenAI service.

Fingerprint

Dive into the research topics of 'Are Emergent Abilities in Large Language Models just In-Context Learning?'. Together they form a unique fingerprint.

Cite this