Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded

2018

Month Degree Awarded

May

First Advisor

Sridhar Mahadevan

Second Advisor

M. Darby Dyar

Subject Categories

Artificial Intelligence and Robotics

Abstract

Advances in scientific instrumentation technology have increased the speed of data acquisition and the precision of sampling, creating an abundance of high-dimensional data sets. The ability to combine these disparate data sets and to transfer information between them is critical to accurate scientific analysis. Many modern-day instruments can record data at many thousands of channels, far greater than the actual degrees of freedom in the sample data. This makes manifold learning, a class of methods that exploit the observation that high-dimensional data tend to lie on lower-dimensional manifolds, especially well-suited to this transfer learning task.

Existing manifold-based transfer learning methods can align related data sets in differing feature representations, but their inherent single manifold assumption causes them to fail in the presence of complex mixtures of manifolds. In this dissertation, a new class of transfer learning algorithms is developed for high-dimensional data sets that intrinsically lie on multiple low-dimensional manifolds. With a more realistic mixture of manifolds assumption, this class of algorithms allows for accurate and efficient transfer of information between data sets by aligning their complex underlying geometries.

In this dissertation, algorithms are presented that leverage corresponding samples between data sets and available label information, continuous or categorical. The two primary tasks are aligning mixtures of manifolds and heterogeneous domain adaptation of multi-manifold data sets. Linear, non-linear, and robust versions of the algorithm are described, as well as a method for actively selecting cross-data set correspondences. To show the practical effectiveness of these algorithms, they are compared across a number of synthetic and real-world domains, but most notably to align data recorded by spectroscopic instruments during space exploration, a new domain for transfer learning.

Share

COinS