Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier

https://orcid.org/0000-0002-0941-6702

AccessType

Open Access Dissertation

Document Type

dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Mathematics

Month Degree Awarded

May

First Advisor

Daeyoung Kim

Subject Categories

Applied Statistics | Categorical Data Analysis | Data Science | Multivariate Analysis | Statistical Methodology | Statistical Models | Statistical Theory

Abstract

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method for the descriptive modeling to delineate and quantify the association structure between an ordinal dependent variable and a set of categorical independent variables in a multi-dimensional contingency table.

The proposed method consists of four components: subcopula score, subcopula regression, subcopula regression based association measure and its (sequential/non-sequenti-al) decompositions. The subcopula score is a data-dependent scoring method for an ordinal variable reflecting the ordered nature of its categories. The subcopula regression leverages the subcopula scores to identify the association structure between the ordinal dependent variable and a set of categorical independent variables. The subcopula regression based association measure exploits the subcopula regression to quantify the strength of the association structure in a model-free manner. The sequential and non-sequential decompositions of the proposed association measure evaluate the contribution of the subsets of independent variables to the overall association in various forms such as marginal, conditional, interactive and correlative association.

We first study the theoretical properties of the subcopula score, subcopula regression, subcopula regression based association measure and its (sequential/non-sequential) decompositions. Next we develop the statistical inference for the proposed method including point estimation, (asymptotic/bootstrap) confidence intervals and permutation based hypothesis testing. Then we examine the finite-sample properties of the proposed overall, marginal and conditional association measures in multi-dimensional contingency tables. Finally, we demonstrate the potential use of the proposed method in real-world applications.

DOI

https://doi.org/10.7275/22254792.0

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS