Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier



Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded


Month Degree Awarded


First Advisor

Andrew G. Barto

Second Advisor

Sridhar Mahadevan

Third Advisor

Roderic A. Grupen

Fourth Advisor

Neil E. Berthier

Subject Categories

Computer Engineering | Controls and Control Theory


The acquisition of hierarchies of reusable skills is one of the distinguishing characteristics of human intelligence, and the learning of such hierarchies is an important open problem in computational reinforcement learning (RL). In humans, these skills are learned during a substantial developmental period in which individuals are intrinsically motivated to explore their environment and learn about the effects of their actions. The skills learned during this period of exploration are then reused to great effect later in life to solve many unfamiliar problems very quickly. This thesis presents novel methods for achieving such developmental acquisition of skill hierarchies in artificial agents by rewarding them for using their current skill set to better understand the effects of their actions on unfamiliar parts of their environment, which in turn leads to the formation of new skills and further exploration, in a life-long process of hierarchical exploration and skill learning.

In particular, we present algorithms for intrinsically motivated hierarchical exploration of Markov Decision Processes (MDPs) and finite factored MDPs (FMDPs). These methods integrate existing research on temporal abstraction in MDPs, intrinsically motivated RL, hierarchical decomposition of finite FMDPs, Bayesian network structure learning, and information theory to achieve long-term, incremental acquisition of skill hierarchies in these environments. Moreover, we show that the skill hierarchies learned in this fashion afford an agent the ability to solve novel tasks in its environment much more quickly than solving them from scratch.

To apply these techniques to environments with representational properties that differ from traditional MDPs and finite FMDPs requires methods for incrementally learning transition models of environments with such representations. Taking a step in this direction, we also present novel methods for incremental model learning in two other types of environments. The first is an algorithm for online, incremental structure learning of transition functions for FMDPs with continuous-valued state and action variables. The second is an algorithm for learning the parameters of a predictive state representation, which serves as a model of partially observable dynamical systems with continuous-valued observations and actions. These techniques serve as a prerequisite to future work applying intrinsically motivated skill learning to these types of environments.