Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.
Author ORCID Identifier
Campus-Only Access for Five (5) Years
Doctor of Philosophy (PhD)
Year Degree Awarded
Month Degree Awarded
Markos A. Katsoulakis
Probability | Statistical Models
Probabilistic models have been widely used for understanding and predicting complex phenomena in many fields including life sciences, finance, social networks, chemistry and so on. Despite the potential power and promise, there are inevitably some errors and uncertainty in the models we built based on the data. In this work, we seek to understand and control the model bias which often inevitably occurs in model building and which is itself a measure of reliable predictions. The goal of the dissertation is developing some information inequalities for quantifying the model bias via mathematical tools including uncertainty quantification, information theory, robust optimization, and approximate inference.
In the first chapter, we show that the classic information inequalities such as Pinsker inequalities and other inequalities based on the Hellinger distance, the chi2 divergence and the Renyi divergence perform poorly for the purpose of controlling quantities of interest of systems with many degrees of freedom, and/or in long time regimes. On the other hand, we demonstrate the only available scalable information bounds for quantities of interest of high-dimensional probabilistic models, which was derived by P. Dupuis and K. Chowdhary and their collaborators via Gibbs variational formula for Kullback-Leibler divergence. Scalability of inequalities allows us to obtain uncertainty quantification bounds for quantities of interest in the large degree of freedom limit and/or at long time regimes and address model-form uncertainty, i.e. comparing different extended models and corresponding quantities of interest. We demonstrate some of these properties by deriving robust uncertainty quantification bounds for phase diagrams in statistical mechanics models.
In the second chapter, we derive tight and computable bounds on the bias of statistical estimators, or more generally of quantities of interest, when evaluated on a baseline model P rather than on the typically unknown true model Q. Our proposed method combines the scalable information inequality derived by P. Dupuis, K.Chowdhary and their collaborators together with classical concentration inequalities (such as Bennett's and Hoeffding-Azuma inequalities). Our bounds are expressed in terms of the Kullback-Leibler divergence of model Q with respect to P and the moment generating function for the statistical estimator under P. Furthermore, concentration inequalities, i.e. bounds on moment generating functions, provide tight and computationally inexpensive model bias bounds for quantities of interest. Finally, they allow us to derive rigorous confidence bands for statistical estimators that account for model bias and are valid for an arbitrary amount of data. As an application, we implement the derived bounds for the high-dimensional Markov Random Field model as well as a data-driven model for the lifetime of lithium batteries.
The last chapter discusses the uncertainty quantification bounds for risk-sensitive functionals. Risk-sensitive functionals play an important role in economics as well as control areas. It is also the proper way to measure and analyze the impact of rare events. However, evaluating the risk-sensitive functionals is not possible when the models are intractable or not fully known. For this part of the thesis, we develop uncertainty and sensitivity bounds for risk-sensitive functionals based on an extended Gibbs variational formula involving the Renyi divergence. These bounds are scalable for high-dimensional probabilistic models and tight with respect to particular observables or family of measures. Based on the derived bounds, we also propose new Cramer Rao bounds for the sensitivity analysis of risk-sensitive functionals and relate them to the goal-oriented divergence of the score function.
wang, Jie, "SCALABLE UNCERTAINTY QUANTIFICATION BOUNDS FOR PREDICTIVE MODELING" (2019). Doctoral Dissertations. 1776.