Loading...
Thumbnail Image
Publication

Exploiting Structures in Interactive Decision Making

Citations
Altmetric:
Abstract
In this thesis we study several problems in interactive decision making. Interactive decision making plays an important role in many applications such as online advertisement and autonomous driving. Two classical problems are multi-armed bandits and reinforcement learning. Here and more broadly, the central challenge is the \emph{exploration-exploitation} tradeoff, whereby the agent must decide whether to explore uncertain actions that could potentially bring high reward or to stick to the known good actions. Resolving this challenge is particularly difficult in settings with large or continuous state and action spaces. For reinforcement learning, function approximation is a prevalent structure to manage large state and action spaces. However, misspecification of the function classes can have a detrimental effect on the statistical outcomes. These structured settings are the focus of this thesis. First we study the combinatorial pure exploration problem in the multi-arm bandit framework. In this problem, we are given $K$ distributions and a collection of subsets $\Vcal \subset 2^{[K]}$ of these distributions, and we would like to find the subset $v \in \Vcal$ that has largest mean, while collecting, in a sequential fashion, as few samples from the distributions as possible. We develop new algorithms with strong statistical and computational guarantees by leveraging precise concentration-of-measure arguments and a reduction to linear programming. Second we study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of a variant of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. Our results are the first provably adaptive guarantees for reinforcement learning in metric spaces. Finally, we study a more fundamental problem of \emph{distribution shift}, where training and deployment conditions for a machine learning model differ. We study the effect of distribution shift in the presence of model misspecification, specifically focusing on $L_{\infty}$-misspecified regression and \emph{adversarial covariate shift}, where the regression target remains fixed while the covariate distribution changes arbitrarily. We develop a new algorithm---inspired by robust optimization techniques—that avoids misspecification amplification while still obtaining optimal statistical rates. As applications, we use this regression procedure to obtain new guarantees in offline and online reinforcement learning with misspecification and establish new separations between previously studied structural conditions and notions of coverage.
Type
Dissertation (Open Access)
Date
2024-09
Publisher
License
Attribution 4.0 International
Attribution 4.0 International
License
http://creativecommons.org/licenses/by/4.0/