Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier

https://orcid.org/0000-0003-1565-6718

AccessType

Open Access Dissertation

Document Type

dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded

2020

Month Degree Awarded

May

First Advisor

Philip S. Thomas

Second Advisor

Erik Learned-Miller

Third Advisor

Shlomo Zilberstein

Fourth Advisor

Melinda D. Dyar

Subject Categories

Artificial Intelligence and Robotics

Abstract

In this dissertation we develop techniques to leverage prior knowledge for improving the learning speed of existing reinforcement learning (RL) algorithms. RL systems can be expensive to train, which limits its applicability when a large number of agents need to be trained to solve a large number of tasks; a situation that often occurs in industry and is often ignored in the RL literature. In this thesis, we develop three methods to leverage the experience obtained from solving a small number of tasks to improve an agent's ability to learn on new tasks the agent might face in the future. First, we propose using compression algorithms to identify macros that are likely to be generated by an optimal policy. Because compression techniques identify sequences that occur frequently, they can be used to identify action patterns that are often required to solve a task. Second, we address some of the limitations present in the first method by formalizing an optimization problem that allows an agent to learn a set of options that are appropriate for the tasks. Specifically, we propose an analogous objective to compression by minimizing the number of decisions an agent has to make to generate the observed optimal behavior. This technique also addresses a question that is often ignored in the option literature: how many options are needed? Finally, we show that prior experience can also be leveraged to address the exploration-exploitation dilemma; a central problem in RL. We propose a framework in which a small number of tasks are used to train a meta-agent on how to explore. After being trained, any agent facing a new task can query the meta-agent on what action it should take for exploration. We show empirically that, when facing a large number of tasks, leveraging prior experience can be an effective way of improving existing reinforcement learning techniques. At present, the application of RL in the industry setting remains rather limited. One of the reasons being how costly and time consuming training large scale systems can be. We hope this work provides some guidance for future work, and that it inspires new research in exploiting existing knowledge to make RL a practical alternative to tackle large scale real-world problems.

DOI

https://doi.org/10.7275/q4mw-sh77

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS