Abstract

Multi-task reinforcement learning (MTRL) holds potential for building general-purpose agents, enabling them to generalize across a variety of tasks. However, MTRL may still be susceptible to conflicts between tasks. A primary reason for this problem is that a universal policy struggles to balance short-term and dense learning signals across various tasks, e.g. , distinct reward functions in reinforcement learning. In social cognitive theory, internalized future goals, as a form of cognitive representations, can effectively mitigate potential short-term conflicts in multitask settings. Considering the benefits of future goals, we propose a novel and general framework called Task-Specific Action Correction (TSAC) from the goal perspective as an orthogonal research to previous MTRL methods. Specifically, to avoid myopia, TSAC introduces goal-oriented sparse rewards and decomposes policy learning into two separate policies: a shared policy (SP) and an action correction policy (ACP). The SP outputs a short-term perspective action based on guiding dense rewards. To alleviate conflicts resulting from excessive focus on specific tasks' details in SP, the ACP incorporates goal-oriented sparse rewards, enabling an agent to adopt a long-term perspective to output a correction action and achieve generalization across tasks. Finally, the actions output by SP and ACP are combined based on the action correction function to form a final action that interact with the environment. Extensive experiments conducted on Meta-World and multi-task StarCraft II multi-agent scenarios demonstrate that TSAC outperforms existing state-of-the-art methods, achieving significant improvements in sample efficiency, generalization and effective action execution across tasks.

Original languageEnglish
JournalIEEE Transactions on Cognitive and Developmental Systems
Early online date19 Feb 2025
DOIs
Publication statusE-pub ahead of print - 19 Feb 2025

Keywords

  • Future goals
  • generalization
  • Lagrangian method
  • Multi-task reinforcement learning

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction'. Together they form a unique fingerprint.

Cite this