Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier


Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded


Month Degree Awarded


First Advisor

David D. Jensen

Second Advisor

J. Eliot B. Moss

Subject Categories

Other Computer Sciences


Experimentation increasingly drives everyday decisions in modern life, as it is considered by some to be the gold standard for determining cause and effect within any system. Digital experiments have expanded the scope and frequency of experiments, which can range in complexity from classic A/B tests to contextual bandits experiments, which share features with reinforcement learning. Although there exists a large body of prior work on estimating treatment effects using experiments, this prior work did not anticipate the new challenges and opportu- nities introduced by digital experimentation. Novel errors and threats to validity arise at the intersection of software and experimentation, especially when experimentation is in service of understanding humans behavior or autonomous black-box agents. We present several novel tools for automating aspects of the experimentation- analysis pipeline. We propose new methods for evaluating online field experimentation, automatically generating corresponding analyses of treatment effects. We then draw the connection between software testing and experimental design and argue that applying software testing techniques to a kind of autonomous agent—a deep reinforcement learning agent—to demonstrate the need for novel testing paradigms when a software stack uses learned components that may have emergent behavior. We show how our system may be used to evaluate claims made about the behavior of autonomous agents and find that some claims do not hold up under test. Finally, we show how to produce explanations of the behavior of black-box software-defined agents interacting with white-box environments via automated experimentation. We show how an automated system can be used for exploratory data analysis, with a human in the loop, to investigate a large space of possible counterfactual explanations.


Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.