Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.
Author ORCID Identifier
N/A
AccessType
Open Access Dissertation
Document Type
dissertation
Degree Name
Doctor of Philosophy (PhD)
Degree Program
Mathematics
Year Degree Awarded
2017
Month Degree Awarded
May
First Advisor
Markos A. Katsoulakis
Second Advisor
Luc Rey-Bellet
Third Advisor
Patrick Flaherty
Fourth Advisor
Arya Mazumdar
Subject Categories
Numerical Analysis and Computation | Statistical Models
Abstract
The ever-increasing complexity of the models used in predictive modeling and data science and their use for prediction and inference has made the development of tools for uncertainty quantification and model selection especially important. In this work, we seek to understand the various trade-offs associated with the simulation of stochastic systems. Some trade-offs are computational, e.g., execution time of an algorithm versus accuracy of simulation. Others are analytical: whether or not we are able to find tractable substitutes for quantities of interest, e.g., distributions, ergodic averages, etc. The first two chapters of this thesis deal with the study of the long-time behavior of parallel lattice Kinetic Monte Carlo (PL-KMC) algorithms for interacting particle systems. We introduce the relative entropy rate (RER) as a measure of long-time loss of information and illustrate that it is a computable a posteriori quantity. The RER can act as an information criterion (IC), discriminating between different parameter choices for the schemes and allowing comparisons at equilibrium. We make explicit how the RER scales with the time-step and the size of the system and that it captures details about the connectivity of the original process. Another feature of long-time behavior is time-reversibility, which some physical systems naturally exhibit. Unfortunately, due to the domain and time-discretization, PL-KMC cannot conserve this property. To quantify the loss of reversibility, we introduce the entropy production rate (EPR) as an IC for comparisons between different schemes. We show that the EPR shares a lot of the properties of the RER and can be estimated efficiently from data. The last chapter discusses uncertainty quantification for model bias. By connecting a recently derived goal-oriented divergence and concentration bounds, we define new divergences that provide computable bounds for model bias. The new bounds scale appropriately with data and become progressively more accurate depending on available information about the models and the quantities of interest. We discuss how the bounds allow us to bypass computationally expensive Monte Carlo sampling or specialized methods, e.g., Multilevel Monte Carlo.
DOI
https://doi.org/10.7275/9996991.0
Recommended Citation
Gourgoulias, Kostantinos, "Information Metrics for Predictive Modeling and Machine Learning" (2017). Doctoral Dissertations. 1006.
https://doi.org/10.7275/9996991.0
https://scholarworks.umass.edu/dissertations_2/1006