Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier

https://orcid.org/0000-0001-8770-8754

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded

2020

Month Degree Awarded

May

First Advisor

Erik Learned-Miller

Subject Categories

Artificial Intelligence and Robotics

Abstract

Deep neural networks (DNN) have seen tremendous success in the past few years, advancing state of the art in many AI areas by significant margins. Part of the success can be attributed to the wide adoption of convolutional filters. These filters can effectively capture the invariance in data, leading to faster training and more compact representations, and at the same can leverage efficient parallel implementations on modern hardware. Since convolution operates on regularly structured grids, it is a particularly good fit for texts and images where there are inherent rigid 1D or 2D structures. However, extending DNNs to 3D or higher-dimensional spaces is non-trivial, because data in such spaces often lack regular structure, and the curse of dimensionality can also adversely impact performance in multiple ways.

In this dissertation, we present several new types of neural network operations and architectures for data in 3D and higher-dimensional spaces and demonstrate how we can mitigate these issues while retaining the favorable properties of 2D convolutions. First, we investigate view-based representations for 3D shape recognition. We show that a collection of 2D views can be highly informative, and we can adapt standard 2D DNNs with a simple pooling strategy to recognize objects based on their appearances from multiple viewing angles with unprecedented accuracies. Our next study makes a connection between 3D point cloud processing and sparse high-dimensional filtering. The resulting representation is highly efficient and flexible, and enables native 3D operations as well as joint 2D-3D reasoning. Finally, we show that high-dimensional filtering is also a powerful tool for content-adaptive image filtering. We demonstrate its utility in computer vision applications where preserving sharp details in output is critical, including joint upsampling and semantic segmentation.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS