Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier


Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded


Month Degree Awarded


First Advisor

Philip Thomas

Second Advisor

Yuriy Brun

Third Advisor

Daniel Sheldon

Fourth Advisor

Melinda D. Dyar

Subject Categories

Artificial Intelligence and Robotics


As increasingly sensitive decision making problems become automated using models trained by machine learning algorithms, it is important for machine learning researchers to design training algorithms that provide assurance that the models they produce will be well behaved. While significant progress has been made toward designing safe machine learning algorithms, there are several obstacles that prevent these strategies from being useful in practice. In this defense, I will highlight two of these challenges, and provide methods and results demonstrating that they can be overcome.

First, for many applications, the user must be able to easily specify general and potentially complex definitions of unsafe behavior. While most existing safe machine learning algorithms make strong assumptions about how unsafe behavior is defined, I will describe a flexible interface that allows the user to specify their definitions in a straightforward way at training time, and that is general enough to enforce a wide range of commonly used definitions.

Second, users often require guarantees to hold even when a trained model is deployed into an environment that differs from the training environment. In these settings, the safety guarantees provided by existing methods are no longer valid when the environment changes, presenting significant risk. I will consider two instances of this problem. In the first instance, I will provide algorithms with safety guarantees that hold when the differences between the training and deployment environments are caused by a change in the probability of encountering certain classes of observations. These algorithms are particularly useful in social applications, where the distribution of protected attributes, such as race or sex, may change over time. Next, I will provide algorithms with safety guarantees that hold in more general settings, in which the differences between the training and deployment environments are more challenging to describe. In both settings, I will present experiments showing that the guarantees provided by these algorithms are valid in practice, even when these changes are made antagonistically.


Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.