Loading...
Thumbnail Image
Publication

ADAPTIVE STEP-SIZES FOR REINFORCEMENT LEARNING

Citations
Altmetric:
Abstract
The central theme motivating this dissertation is the desire to develop reinforcement learning algorithms that “just work” regardless of the domain in which they are applied. The largest impediment to this goal is the sensitivity of reinforcement learning algorithms to the step-size parameter used to rescale incremental updates. Adaptive step-size algorithms attempt to reduce this sensitivity or eliminate the step-size parameter entirely by automatically adjusting the step size throughout the learning process. Such algorithms provide an alternative to the standard “guess-and-check” methods used to find parameters known as parameter tuning. However, the problems with parameter tuning are currently masked by the way experiments are conducted and presented. In this dissertation we seek algorithms that perform well over a broad subset of reinforcement learning problems with minimal parameter tuning. To accomplish this we begin by addressing the limitations of current empirical methods in reinforcement learning and propose improvements with benefits far outside the area of adaptive step-sizes. In order to study adaptive step-sizes in reinforcement learning we show that the general form of the adaptive step-size problem is a combination of two dissociable problems (adaptive scalar step-size and update whitening). We then derive new parameter-free adaptive scalar step-size algorithms for the reinforcement learning algorithm Sarsa(λ) and use our improved empirical methods to conduct a thorough experimental study of step-size algorithms in reinforcement learning. Our adaptive algorithms (VES and PARL2) both eliminate the need for a tunable step-size parameter and perform at least as well as Sarsa(λ) with an optimized step-size value. We conclude by developing natural temporal difference algorithms that provide an approximate solution to the update whitening problem and improve performance over their non-natural counterparts.
Type
dissertation
Date
2014
Publisher
License
License