Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Finite-memory control of partially observable systems

Eric Anton Hansen, University of Massachusetts Amherst

Abstract

A partially observable Markov decision process (POMDP) is a model of planning and control that enables reasoning about actions with stochastic effects and observations that provide imperfect information. It has applications in diverse fields that include artificial intelligence, operations research and optimal control, although computational difficulties have limited its use. This thesis presents new dynamic-programming and heuristic-search algorithms for solving infinite-horizon POMDPs. These algorithms represent a plan or policy as a finite-state controller and exploit this representation to improve the efficiency of problem solving. One contribution of this thesis is an improved policy-iteration algorithm that searches in a policy space of finite-state controllers. It is based on a new interpretation of the dynamic-programming operator for POMDPs as the transformation of a finite-state controller into an improved finite-state controller. Empirically, it outperforms value iteration in solving infinite-horizon POMDPs. Dynamic-programming algorithms such as policy iteration and value iteration compute an optimal policy for every possible starting state. An advantage of heuristic search is that it can focus computation on finding an optimal policy for a single starting state. However, it has not been used before to find solutions with loops, that is, solutions that take the form of finite-state controllers. A second contribution of this thesis is to show how to generalize heuristic search to find policies that take this more general form. Three algorithms that use heuristic search to solve POMDPs are presented. Two solve special cases of the POMDP problem. The third solves the general POMDP problem by iteratively improving a finite-state controller in the same way as policy iteration, but focuses computation where it is most likely to improve the value of the controller for a starting state.

Subject Area

Computer science|Operations research

Recommended Citation

Hansen, Eric Anton, "Finite-memory control of partially observable systems" (1998). Doctoral Dissertations Available from Proquest. AAI9909170.
https://scholarworks.umass.edu/dissertations/AAI9909170

Share

COinS