Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.
Author ORCID Identifier
Open Access Dissertation
Doctor of Philosophy (PhD)
Year Degree Awarded
Month Degree Awarded
Artificial Intelligence and Robotics
Recent advances in computer vision are in part due to the widespread use of deep neural networks. However, training deep networks require enormous amounts of labeled data which can be a bottleneck. In this thesis, we propose several approaches to mitigate this in the context of modern deep networks and computer vision tasks.
While transfer learning is an effective strategy for natural image tasks where large labeled datasets such as ImageNet are available, it is less effective for distant domains such as medical images and 3D shapes. Chapter 2 focuses on transfer learning from natural image representations to other modalities. In many cases, cross-modal data can be generated using computer graphics techniques. By forcing the agreement of predictions across modalities, we show that the models are more robust to image degradation, such as lower resolution, grayscale, or line drawings instead of color images in high-resolution. Similarly, we show that 3D shape classifiers learned from multi-view images can be transferred to the models of voxel or point cloud representations.
Another line of work has focused on techniques for few-shot learning. In particular, meta-learning approaches explicitly aim to generalize representations by emphasizing transferability to novel tasks. In Chapter 3, we analyze how to improve these techniques by exploiting unlabeled data from related tasks. We show that combining unsupervised objectives with meta-learning objectives can boost the performance of novel tasks. However, we find that small amounts of domain-specific data can be more beneficial than large amounts of generic data.
While transfer learning, unsupervised learning, and few-shot learning have been studied in isolation, in practice, one often finds that transfer learning from large labeled datasets is more effective than others. This is partly due to a lack of evaluation on benchmarks that contains challenges such as class imbalance and domain mismatch. In Chapter 4, we explore the role of expert models in the context of semi-supervised learning on a realistic benchmark. Unlike existing semi-supervised benchmarks, our dataset is designed to expose some of the challenges encountered in a realistic setting, such as the fine-grained similarity between classes, significant class imbalance, and domain mismatch between the labeled and unlabeled data. We show that current semi-supervised methods are negatively affected by out-of-class data, and their performance pales compared to a transfer learning baseline. Last, we leverage the coarse labels from a large collection of images to improve semi-supervised learning. In Chapter 5, we show that incorporating hierarchical labels in the taxonomy improves state-of-the-art semi-supervised methods.
Su, Jong-Chyi, "Learning from Limited Labeled Data for Visual Recognition" (2021). Doctoral Dissertations. 2365.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.