Loading...
Thumbnail Image
Publication

Model-Free Descriptive Modeling for Multivariate Categorical Data with An Ordinal Dependent Variable

Citations
Altmetric:
Abstract
In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method for the descriptive modeling to delineate and quantify the association structure between an ordinal dependent variable and a set of categorical independent variables in a multi-dimensional contingency table. The proposed method consists of four components: subcopula score, subcopula regression, subcopula regression based association measure and its (sequential/non-sequenti-al) decompositions. The subcopula score is a data-dependent scoring method for an ordinal variable reflecting the ordered nature of its categories. The subcopula regression leverages the subcopula scores to identify the association structure between the ordinal dependent variable and a set of categorical independent variables. The subcopula regression based association measure exploits the subcopula regression to quantify the strength of the association structure in a model-free manner. The sequential and non-sequential decompositions of the proposed association measure evaluate the contribution of the subsets of independent variables to the overall association in various forms such as marginal, conditional, interactive and correlative association. We first study the theoretical properties of the subcopula score, subcopula regression, subcopula regression based association measure and its (sequential/non-sequential) decompositions. Next we develop the statistical inference for the proposed method including point estimation, (asymptotic/bootstrap) confidence intervals and permutation based hypothesis testing. Then we examine the finite-sample properties of the proposed overall, marginal and conditional association measures in multi-dimensional contingency tables. Finally, we demonstrate the potential use of the proposed method in real-world applications.
Type
dissertation
Date
2021-05
Publisher
License
License
http://creativecommons.org/licenses/by/4.0/
Research Projects
Organizational Units
Journal Issue
Embargo Lift Date
Publisher Version
Embedded videos
Related Item(s)