Abstract

Continual reinforcement learning poses a major challenge due to the tendency of agents to experience catastrophic forgetting when learning sequential tasks. In this paper, we introduce a modularity-based approach, called Hierarchical Orchestra of Policies (HOP), designed to mitigate catastrophic forgetting in lifelong reinforcement learning. HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks. Unlike other state-of-the-art methods, HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous. Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks and performs comparably to state-of-the-art transfer methods that require task labelling. Moreover, HOP achieves this without compromising performance when tasks remain constant, highlighting its versatility.
Original languageEnglish
Number of pages1
Publication statusPublished - 9 Oct 2024
EventIntrinsically-Motivated and Open-Ended Learning Workshop : @NeurIPS2024 - Vancouver, USA United States
Duration: 15 Dec 2024 → …
https://imol-workshop.github.io/

Workshop

WorkshopIntrinsically-Motivated and Open-Ended Learning Workshop
Abbreviated titleNeurIPS 2024 Workshop IMOL
Country/TerritoryUSA United States
CityVancouver
Period15/12/24 → …
Internet address

Keywords

  • Continual Learning
  • Reinforcement Learning
  • Mitigating Catastrophic Forgetting
  • Hierarchical Reinforcement Learning
  • Lifelong Learning

Fingerprint

Dive into the research topics of 'Hierarchical Orchestra of Policies'. Together they form a unique fingerprint.

Cite this