Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork

Frans Oliehoek, Shi Yuan Tang, Jie Zhang

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

2 Citations (SciVal)

Abstract

Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyper-parameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural exponential family distribution such as multivariate Gaussian is used to parameterize the policy distribution. Using a multivariate Gaussian limits the quality of CEM policies as the search becomes confined to a less representative subspace. We address this drawback by using an adversarially-trained hypernetwork, enabling a richer and complex representation of the policy distribution. To achieve better training stability and faster convergence, we use a multivariate Gaussian CEM policy to guide our adversarial training process. Experiments demonstrate that our approach outperforms state-of-the-art CEM-based methods by $15.8%$ in terms of rewards while achieving faster convergence. Results also show that our approach is less sensitive to hyper-parameters than other deep-RL methods such as REINFORCE, DDPG and DQN.
Original languageEnglish
Title of host publicationAAMAS 2021
PublisherUnderline Science
Pages1296-1304
Number of pages8
Volume2021
DOIs
Publication statusPublished - 4 May 2021

Bibliographical note

Tenth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2011) ; Conference date: 02-05-2011 Through 06-05-2011

Keywords

  • Cross-Entropy Method
  • Generative Adversarial Networks
  • Hypernetworks
  • Reinforcement Learning

Fingerprint

Dive into the research topics of 'Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork'. Together they form a unique fingerprint.

Cite this