Abstract
Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyper-parameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural exponential family distribution such as multivariate Gaussian is used to parameterize the policy distribution. Using a multivariate Gaussian limits the quality of CEM policies as the search becomes confined to a less representative subspace. We address this drawback by using an adversarially-trained hypernetwork, enabling a richer and complex representation of the policy distribution. To achieve better training stability and faster convergence, we use a multivariate Gaussian CEM policy to guide our adversarial training process. Experiments demonstrate that our approach outperforms state-of-the-art CEM-based methods by $15.8%$ in terms of rewards while achieving faster convergence. Results also show that our approach is less sensitive to hyper-parameters than other deep-RL methods such as REINFORCE, DDPG and DQN.
Original language | English |
---|---|
Title of host publication | AAMAS 2021 |
Publisher | Underline Science |
Pages | 1296-1304 |
Number of pages | 8 |
Volume | 2021 |
DOIs | |
Publication status | Published - 4 May 2021 |
Bibliographical note
Tenth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2011) ; Conference date: 02-05-2011 Through 06-05-2011Keywords
- Cross-Entropy Method
- Generative Adversarial Networks
- Hypernetworks
- Reinforcement Learning