Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier



Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded


Month Degree Awarded


First Advisor

Sridhar Mahadevan

Second Advisor

Andrew G. Barto

Third Advisor

Shlomo Zilberstein

Fourth Advisor

Weibo Gong

Subject Categories

Artificial Intelligence and Robotics


This thesis presents a general framework for first-order temporal difference learning algorithms with an in-depth theoretical analysis. The main contribution of the thesis is the development and design of a family of first-order regularized temporal-difference (TD) algorithms using stochastic approximation and stochastic optimization. To scale up TD algorithms to large-scale problems, we use first-order optimization to explore regularized TD methods using linear value function approximation. Previous regularized TD methods often use matrix inversion, which requires cubic time and quadratic memory complexity. We propose two algorithms, sparse-Q and RO-TD, for on-policy and off-policy learning, respectively. These two algorithms exhibit linear computational complexity per-step, and their asymptotic convergence guarantee and error bound analysis are given using stochastic optimization and stochastic approximation. The second major contribution of the thesis is the establishment of a unified general framework for stochastic-gradient-based temporal-difference learning algorithms that use proximal gradient methods. The primal-dual saddle-point formulation is introduced, and state-of-the-art stochastic gradient solvers, such as mirror descent and extragradient are used to design several novel RL algorithms. Theoretical analysis is given, including regularization, acceleration analysis and finite-sample analysis, along with detailed empirical experiments to demonstrate the effectiveness of the proposed algorithms.