Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Learning to solve Markovian decision processes

Satinder Pal Singh, University of Massachusetts Amherst

Abstract

This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have developed reinforcement learning (RL) algorithms based on dynamic programming (DP) that use the agent's experience in its environment to improve its decision policy incrementally. This is achieved by adapting all evaluation function in such a way that the decision policy that is "greedy" with respect to it improves with experience. This dissertation focuses on finite, stationary and Markovian environments for two reasons: it allows the development and use of a strong theory of RL, and there are many challenging real-world RL tasks that fall into this category. This dissertation establishes a novel connection between stochastic approximation theory and RL that provides a uniform framework for understanding all the different RL algorithms that have been proposed to date. It also highlights a dimension that clearly separates all RL research from prior work on DP. Two other theoretical results showing how approximations affect performance in RL provide partial justification for the use of compact function approximators in RL. In addition, a new family of "soft" DP algorithms is presented. These algorithms converge to solutions that are more robust than the solutions found by classical DP algorithms. Despite all of the theoretical progress, conventional RL architectures scale poorly enough to make them impractical for many real-world problems. This dissertation studies two aspects of the scaling issue: the need to accelerate RL, and the need to build RL architectures that can learn to solve multiple tasks. It presents three RL architectures, CQ-L, H-DYNA, and BB-RL, that accelerate learning by facilitating transfer of training from simple to complex tasks. Each architecture uses a different method to achieve transfer of training; CQ-L uses the evaluation functions for simple tasks as building blocks to construct the evaluation function for complex tasks, H-DYNA uses the evaluation functions for simple tasks to build an abstract environment model, and BB-RL uses the decision policies found for the simple tasks as the primitive actions for the complex tasks. A mixture of theoretical and empirical results are presented to support the new RL architectures developed in this dissertation.

Subject Area

Computer science|Artificial intelligence

Recommended Citation

Singh, Satinder Pal, "Learning to solve Markovian decision processes" (1994). Doctoral Dissertations Available from Proquest. AAI9420690.
https://scholarworks.umass.edu/dissertations/AAI9420690

Share

COinS