Abstract
To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use human feedback as channels to learn the desired behavior. We describe the competition and provide an overview of the top solutions. We conclude by discussing the impact of the competition and future directions for improvement.
Original language | English |
---|---|
Pages (from-to) | 171-188 |
Number of pages | 18 |
Journal | Proceedings of Machine Learning Research |
Volume | 220 |
Publication status | Published - 9 Dec 2022 |
Event | 36th Annual Conference on Neural Information Processing Systems, NeurIPS 2022 - Virtual, Online, USA United States Duration: 28 Nov 2022 → 9 Dec 2022 |
Funding
Running this competition was only possible with the help of many people and organizations. FTX Future Fund, Microsoft, Encultured AI, and AI Journal provided financial support. We thank our amazing advisory board: Fei Fang, KiantéBrantley, Andrew Critch, Sam Devlin, and Oriol Vinyals for their advice and guidance. We thank Skylar Anastasia Ekamper and Martin Andrews for supporting other participants of the competition. We thank Matthew Rahtz for providing detailed feedback on a draft of this paper. Finally, we thank AIcrowd for their help and the MTurk workers for their efforts in evaluating submissions.
Funders | Funder number |
---|---|
Microsoft |
Keywords
- fine-tuning
- imitation learning
- Learning from humans
- preference learning
- reinforcement learning from human feedback
- reward modeling
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability