Abstract
Multi-task reinforcement learning (MTRL) holds potential for building general-purpose agents, enabling them to generalize across a variety of tasks. However, MTRL may still be susceptible to conflicts between tasks. A primary reason for this problem is that a universal policy struggles to balance short-term and dense learning signals across various tasks, e.g. , distinct reward functions in reinforcement learning. In social cognitive theory, internalized future goals, as a form of cognitive representations, can effectively mitigate potential short-term conflicts in multitask settings. Considering the benefits of future goals, we propose a novel and general framework called Task-Specific Action Correction (TSAC) from the goal perspective as an orthogonal research to previous MTRL methods. Specifically, to avoid myopia, TSAC introduces goal-oriented sparse rewards and decomposes policy learning into two separate policies: a shared policy (SP) and an action correction policy (ACP). The SP outputs a short-term perspective action based on guiding dense rewards. To alleviate conflicts resulting from excessive focus on specific tasks' details in SP, the ACP incorporates goal-oriented sparse rewards, enabling an agent to adopt a long-term perspective to output a correction action and achieve generalization across tasks. Finally, the actions output by SP and ACP are combined based on the action correction function to form a final action that interact with the environment. Extensive experiments conducted on Meta-World and multi-task StarCraft II multi-agent scenarios demonstrate that TSAC outperforms existing state-of-the-art methods, achieving significant improvements in sample efficiency, generalization and effective action execution across tasks.
Original language | English |
---|---|
Journal | IEEE Transactions on Cognitive and Developmental Systems |
Early online date | 19 Feb 2025 |
DOIs | |
Publication status | E-pub ahead of print - 19 Feb 2025 |
Keywords
- Future goals
- generalization
- Lagrangian method
- Multi-task reinforcement learning
ASJC Scopus subject areas
- Software
- Artificial Intelligence