Policy Gradient Methods: Analysis, Misconceptions, and Improvements

Nota, Christopher P

Publication

Policy Gradient Methods: Analysis, Misconceptions, and Improvements

Nota, Christopher P

Abstract

Policy gradient methods are a class of reinforcement learning algorithms that optimize a parametric policy by maximizing an objective function that directly measures the performance of the policy. Despite being used in many high-profile applications of reinforcement learning, the conventional use of policy gradient methods in practice deviates from existing theory. This thesis presents a comprehensive mathematical analysis of policy gradient methods, uncovering misconceptions and suggesting novel solutions to improve their performance. We first demonstrate that the update rule used by most policy gradient methods does not correspond to the gradient of any objective function due to the way the discount factor is applied, leading to suboptimal convergence. Subsequently, we show that even when this is taken into account, existing policy gradient algorithms are suboptimal in that they fail to eliminate several sources of variance. To address the first issue, we show that by gradually increasing the discount factor at a particular rate, we can restore the optimal convergence of policy gradient methods. To further address the issue of high variance, we propose a new value function called the posterior value function. This function leverages additional information from later in trajectories that was previously thought to introduce bias. With this function, we construct a new stochastic estimator that eliminates several sources of variance present in most policy gradient methods.

Type

dissertation

Date

2024

Degree

Doctor of Philosophy (PhD)

Advisors

Philip S. Thomas
Bruno Castro da Silva
Scott Niekum
Weibo Gong

License

http://creativecommons.org/licenses/by/4.0/

Policy Gradient Methods: Analysis, Misconceptions, and Improvements

Nota, Christopher P

Abstract

Type

Date

Publisher

Degree

Advisors

Rights

License

Files

Research Projects

Organizational Units

Journal Issue

Embargo

URI

DOI

Publisher Version

Embedded videos

Collections