Flaherty, PatrickAlbert, Gitanjali2024-12-042024-12-042024-0910.7275/55133https://hdl.handle.net/20.500.14394/55133We consider the problem of developing interpretable and computationally efficient matrix decomposition methods for matrices whose entries have bounded support. Such matrices are found in large-scale DNA methylation studies, where the data is bounded by the unit interval. We present a family of decomposition strategies for (0,1)-bounded data based on the Doubly Non-Central Beta (DNCB) distribution. Our three factorization approaches are based on the CP and Tucker decompositions. Using an augment-and-marginalize approach, we derive computationally efficient sampling algorithms to solve for the latent factors. We evaluate the performance of our methods using the criteria of predictability, computability, and stability. Empirical results show that our two methods based on the DNCB distribution have similar or better performance as the state-of-the-art in terms of heldout prediction and computational complexity, but have significantly better performance in terms of stability to changes in hyperparameters. Inspired by advances in DNA sequencing technology, we develop a method based on the Conditional DNCB distribution, which allows for the incorporation of additional data collected by modern sequencing methods. This model yields similar guarantees on stability and tractability, while its density demonstrates interesting properties for measuring predictive capability. The improved stability of our models based on the Tucker decomposition results in higher confidence in the results in applications where the constituent factors are used to generate and test scientific hypotheses such as DNA methylation analysis of cancer samples.Attribution-NonCommercial 4.0 InternationalAttribution-NonCommercial 4.0 Internationalhttp://creativecommons.org/licenses/by-nc/4.0/dimensionality reduction, matrix factorization, bounded dataStable Dimensionality Reduction for Bound-Support Matrix DataDissertation (Open Access)N/A