Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier



Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded


Month Degree Awarded


First Advisor

Shlomo Zilberstein

Subject Categories

Artificial Intelligence and Robotics


Planning, namely the ability of an autonomous agent to make decisions leading towards a certain goal, is one of the fundamental components of intelligent behavior. In the face of uncertainty, this problem is typically modeled as a Markov Decision Process (MDP). The MDP framework is highly expressive, and has been used in a variety of applications, such as mobile robots, flow assignment in heterogeneous networks, optimizing software in mobile phones, and aircraft collision avoidance. However, its wide adoption in real-world scenarios is still impaired by the complexity of solving large MDPs. Developing effective ways to tackle this complexity barrier is a challenging research problem. This thesis focuses on the development of scalable and robust MDP solution approaches for partially exploring the state space of an MDP. The main contribution is a series of mathematical and algorithmic techniques for selecting the parts of the state space that are the most critical for effective planning, with the ultimate goal of maximizing performance in the presence of bounded resources. The proposed approaches work on two distinct axes: i) constructing reduced MDP models that are computationally easier to solve, but whose policies still result in near-optimal performance when applied to the original model, and ii) using sampling-based exploration that is biased towards states for which additional computation can be more productive, in a well-defined sense. The first part of the thesis addresses the model reduction component, introducing an MDP reduction framework that generalizes popular solution approaches based on determinization. In particular, the framework encompasses a spectrum of MDP reductions differing along two dimensions: i) the number of outcomes per state-action pair that are fully accounted for, and ii) the number of occurrences of the remaining, exceptional, outcomes that are planned for in advance. An important insight resulting from this work is that the choice of reduction is crucial for achieving good performance, an issue under-explored by the planning community, even for determinization-based planners. The second part of the thesis presents a sampling-based approach that does not require modification of the MDP model. The key idea is to avoid computation in states whose estimated optimal values are more likely to be correct, and rather direct it towards states whose values (which are closely related to policy quality) can be improved the most. The proposed approach represents a novel algorithmic framework that generalizes MDP algorithms based on labeling, a widely used technique in state-of-the-art planners. The framework can be leveraged to create a variety of MDP solvers with different trade-offs between computational complexity and policy quality, and its application to a variety of standard MDP benchmarks results in state-of-the-art performance.