Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.
(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)
Temporal abstraction in reinforcement learning
Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions regarding what means of transportation to use, but also chooses low-level actions, such as the movements for getting into a car. The problem of picking an appropriate time scale for reasoning and learning has been explored in artificial intelligence, control theory and robotics. In this dissertation we develop a framework that allows novel solutions to this problem, in the context of Markov Decision Processes (MDPs) and reinforcement learning. In this dissertation, we present a general framework for prediction, control and learning at multiple temporal scales. In this framework, temporally extended actions are represented by a way of behaving (a policy) together with a termination condition. An action represented in this way is called an option. Options can be easily incorporated in MDPs, allowing an agent to use existing controllers, heuristics for picking actions, or learned courses of action. The effects of behaving according to an option can be predicted using multi-time models, learned by interacting with the environment. In this dissertation we develop multi-time models, and we illustrate the way in which they can be used to produce plans of behavior very quickly, using classical dynamic programming or reinforcement learning techniques. The most interesting feature of our framework is that it allows an agent to work simultaneously with high-level and low-level temporal representations. The interplay of these levels can be exploited in order to learn and plan more efficiently and more accurately. We develop new algorithms that take advantage of this structure to improve the quality of plans, and to learn in parallel about the effects of many different options.
Precup, Doina, "Temporal abstraction in reinforcement learning" (2000). Doctoral Dissertations Available from Proquest. AAI9978540.