An intrinsic reward mechanism for efficient exploration

Özgür Şimşek, Andrew G. Barto

Research output: Chapter in Book/Report/Conference proceedingChapter

39 Citations (Scopus)


How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent and propose a principled heuristic for its solution. We present experimental results in a number of domains, also exploring the algorithm’s use for learning a policy for a skill given its reward function—an important but neglected component of skill discovery.
Original languageEnglish
Title of host publicationProceedings of the Twenty-Third International Conference on Machine Learning (ICML 2006): Pittsburgh, Pennsylvania, USA, June 25-29, 2006
EditorsWilliam W. Cohen, Andrew Moore
PublisherAssociation for Computing Machinery
Number of pages8
Publication statusPublished - 2006

Publication series

NameACM International Conference Proceedings

Fingerprint Dive into the research topics of 'An intrinsic reward mechanism for efficient exploration'. Together they form a unique fingerprint.

Cite this