Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded

Spring 2014

First Advisor

Allen R Hanson

Second Advisor

Erik G Learned-Miller

Subject Categories

Artificial Intelligence and Robotics

Abstract

Joint alignment is the process of transforming instances in a data set to make them more similar based on a pre-defined measure of joint similarity. This process has great utility and applicability in many scientific disciplines including radiology, psychology, linguistics, vision, and biology. Most alignment algorithms suffer from two shortcomings. First, they typically fail when presented with complex data sets arising from multiple modalities such as a data set of normal and abnormal heart signals. Second, they require hand-picking appropriate feature representations for each data set, which may be time-consuming and ineffective, or outside the domain of expertise for practitioners.

In this thesis we introduce alignment models that address both shortcomings. In the first part, we present an efficient curve alignment algorithm derived from the congealing framework that is effective on many synthetic and real data sets. We show that using the byproducts of joint alignment, the aligned data and transformation parameters, can dramatically improve classification performance. In the second part, we incorporate unsupervised feature learning based on convolutional restricted Boltzmann machines to learn a representation that is tuned to the statistics of the data set. We show how these features can be used to improve both the alignment quality and classification performance. In the third part, we present a nonparametric Bayesian joint alignment and clustering model which handles data sets arising from multiple modes. We apply this model to synthetic, curve and image data sets and show that by simultaneously aligning and clustering, it can perform significantly better than performing these operations sequentially. It also has the added advantage that it easily lends itself to semi-supervised, online, and distributed implementations.

Overall this thesis takes steps towards developing an unsupervised data processing pipeline that includes alignment, clustering and feature learning. While clustering and feature learning serve as auxiliary information to improve alignment, they are important byproducts. Furthermore, we present a software implementation of all the models described in this thesis. This will enable practitioners from different scientific disciplines to utilize our work, as well as encourage contributions and extensions, and promote reproducible research.

Share

COinS