Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Stephanie Milani, Anssi Kanervisto, Karolis Ramanauskas, Sander Schulhoff, Brandon Houghton, Sharada Mohanty, Byron Galbraith, Ke Chen, Yan Song, Tianze Zhou, Bingquan Yu, He Liu, Kai Guan, Yujing Hu, Tangjie Lv, Federico Malato, Florian Leopold, Amogh Raut, Ville Hautamäki, Andrew MelnikShu Ishida, João F. Henriques, Robert Klassert, Walter Laurito, Lucas Cazzonelli, Cedric Kulbach, Nicholas Popovic, Marvin Schweizer, Ellen Novoseller, Vinicius G. Goecks, Nicholas Waytowich, David Watkins, Josh Miller, Rohin Shah

Research output: Contribution to journalConference articlepeer-review

Abstract

To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use human feedback as channels to learn the desired behavior. We describe the competition and provide an overview of the top solutions. We conclude by discussing the impact of the competition and future directions for improvement.

Original languageEnglish
Pages (from-to)171-188
Number of pages18
JournalProceedings of Machine Learning Research
Volume220
Publication statusPublished - 9 Dec 2022
Event36th Annual Conference on Neural Information Processing Systems, NeurIPS 2022 - Virtual, Online, USA United States
Duration: 28 Nov 20229 Dec 2022

Funding

Running this competition was only possible with the help of many people and organizations. FTX Future Fund, Microsoft, Encultured AI, and AI Journal provided financial support. We thank our amazing advisory board: Fei Fang, KiantéBrantley, Andrew Critch, Sam Devlin, and Oriol Vinyals for their advice and guidance. We thank Skylar Anastasia Ekamper and Martin Andrews for supporting other participants of the competition. We thank Matthew Rahtz for providing detailed feedback on a draft of this paper. Finally, we thank AIcrowd for their help and the MTurk workers for their efforts in evaluating submissions.

FundersFunder number
Microsoft

    Keywords

    • fine-tuning
    • imitation learning
    • Learning from humans
    • preference learning
    • reinforcement learning from human feedback
    • reward modeling

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Software
    • Control and Systems Engineering
    • Statistics and Probability

    Fingerprint

    Dive into the research topics of 'Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition'. Together they form a unique fingerprint.

    Cite this