Abstract
Continual reinforcement learning poses a major challenge due to the tendency of agents to experience catastrophic forgetting when learning sequential tasks. In this paper, we introduce a modularity-based approach, called Hierarchical Orchestra of Policies (HOP), designed to mitigate catastrophic forgetting in lifelong reinforcement learning. HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks. Unlike other state-of-the-art methods, HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous. Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks and performs comparably to state-of-the-art transfer methods that require task labelling. Moreover, HOP achieves this without compromising performance when tasks remain constant, highlighting its versatility.
| Original language | English |
|---|---|
| Number of pages | 1 |
| Publication status | Published - 9 Oct 2024 |
| Event | Intrinsically-Motivated and Open-Ended Learning Workshop : @NeurIPS2024 - Vancouver, USA United States Duration: 15 Dec 2024 → … https://imol-workshop.github.io/ |
Workshop
| Workshop | Intrinsically-Motivated and Open-Ended Learning Workshop |
|---|---|
| Abbreviated title | NeurIPS 2024 Workshop IMOL |
| Country/Territory | USA United States |
| City | Vancouver |
| Period | 15/12/24 → … |
| Internet address |
Keywords
- Continual Learning
- Reinforcement Learning
- Mitigating Catastrophic Forgetting
- Hierarchical Reinforcement Learning
- Lifelong Learning
Fingerprint
Dive into the research topics of 'Hierarchical Orchestra of Policies'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS