Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.
(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)
Hierarchical reinforcement learning in continuous state and multi-agent environments
This dissertation investigates the use of hierarchy and abstraction as a means of solving complex sequential decision making problems such as those with continuous state and/or continuous action spaces, and domains with multiple cooperative agents. This thesis develops several novel extensions to hierarchical reinforcement learning (HRL), and designs algorithms that are appropriate for such problems. It has been shown that the average reward optimality criterion is more natural than the more commonly used discounted criterion for continuing tasks. This thesis investigates two formulations of HRL based on the average reward semi-Markov decision process (SMDP) model, both for discrete-time and continuous-time. These formulations correspond to two notions of optimality that have been explored in previous work on HRL: hierarchical optimality and recursive optimality. Novel discrete-time and continuous-time algorithms, termed hierarchically optimal average reward RL (HAR) and recursively optimal average reward RL (RAR) are presented, which learn to find hierarchically and recursively optimal average reward policies. Two automated guided vehicle (AGV) scheduling problems are used as experimental testbeds to empirically study the performance of the proposed algorithms. Policy gradient reinforcement learning (PGRL) methods have several advantages over the more traditional value function RL algorithms in solving problems with continuous state spaces. However, they suffer from slow convergence. This thesis defines a family of hierarchical policy gradient RL (HPGRL) algorithms for scaling PGRL methods to high-dimensional domains. This thesis also examines the use of HRL to accelerate policy learning in cooperative multi-agent tasks. The use of hierarchy speeds up learning in multi-agent domains by making it possible to learn coordination skills at the level of subtasks instead of primitive actions. Subtask-level coordination allows for increased cooperation skills as agents do not get confused by low-level details. A framework for hierarchical multi-agent RL is developed and an algorithm called Cooperative HRL is presented that solves cooperative multi-agent problems more efficiently. (Abstract shortened by UMI.)
Ghavamzadeh, Mohammad, "Hierarchical reinforcement learning in continuous state and multi-agent environments" (2005). Doctoral Dissertations Available from Proquest. AAI3193906.