Abstract
Continual reinforcement learning poses a major challenge due to the tendency of agents to experience catastrophic forgetting when learning sequential tasks. In this paper, we introduce a modularity-based approach, called Hierarchical Orchestra of Policies (HOP), designed to mitigate catastrophic forgetting in lifelong reinforcement learning. HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks. Unlike other state-of-the-art methods, HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous. Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks and performs comparably to state-of-the-art transfer methods that require task labelling. Moreover, HOP achieves this without compromising performance when tasks remain constant, highlighting its versatility.
Original language | English |
---|---|
Number of pages | 1 |
Publication status | Published - 9 Oct 2024 |
Event | Intrinsically-Motivated and Open-Ended Learning Workshop : @NeurIPS2024 - Vancouver, USA United States Duration: 15 Dec 2024 → … https://imol-workshop.github.io/ |
Workshop
Workshop | Intrinsically-Motivated and Open-Ended Learning Workshop |
---|---|
Abbreviated title | NeurIPS 2024 Workshop IMOL |
Country/Territory | USA United States |
City | Vancouver |
Period | 15/12/24 → … |
Internet address |
Keywords
- Continual Learning
- Reinforcement Learning
- Mitigating Catastrophic Forgetting
- Hierarchical Reinforcement Learning
- Lifelong Learning