Publication Date

2006

Abstract

How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent and propose a principled heuristic for its solution. We present experimental results in a number of domains, also exploring the algorithm’s use for learning a policy for a skill given its reward function—an important but neglected component of skill discovery.

Comments

This paper was harvested from CiteSeer

Recommended Citation

Şimşek, Özgür and Barto, Andrew G., "An Intrinsic Reward Mechanism for Efficient Exploration" (2006). Computer Science Department Faculty Publication Series. 4.
Retrieved from https://scholarworks.umass.edu/cs_faculty_pubs/4

Download

Included in

Computer Sciences Commons

COinS

ScholarWorks@UMass Amherst

Computer Science Department Faculty Publication Series

An Intrinsic Reward Mechanism for Efficient Exploration

Publication Date

Abstract

Comments

Recommended Citation

Included in

Browse

Author Corner

Links

ScholarWorks@UMass Amherst

Computer Science Department Faculty Publication Series

An Intrinsic Reward Mechanism for Efficient Exploration

Authors

Publication Date

Abstract

Comments

Recommended Citation

Included in

Share

Browse

Author Corner

Links