Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier


Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded


Month Degree Awarded


First Advisor

Erik Learned-Miller

Second Advisor

Subhransu Maji

Third Advisor

Liangliang Cao

Fourth Advisor

David Huber

Subject Categories

Artificial Intelligence and Robotics | Computer Sciences


The success of deep neural networks has resulted in computer vision systems that obtain high accuracy on a wide variety of tasks such as image classification, object detection, semantic segmentation, etc. However, most state-of-the-art vision systems are dependent upon large amounts of labeled training data, which is not a scalable solution in the long run. This work focuses on improving existing models for visual object recognition and detection without being dependent on such large-scale human-annotated data. We first show how large numbers of hard examples (cases where an existing model makes a mistake) can be obtained automatically from unlabeled video sequences by exploiting temporal consistency cues in the output of a pre-trained object detector. These examples can strongly influence a model's parameters when the network is re-trained to correct them, resulting in improved performance on several object detection tasks. Further, such hard examples from unlabeled videos can be used to address the problem of unsupervised domain adaptation. We focus on the automatic adaptation of an existing object detector to a new domain with no labeled data, assuming that a large number of unlabeled videos are readily available. Our approach is evaluated on challenging face and pedestrian detection tasks involving large domain shifts, showing improved performance with minimal dependence on hyper-parameters. Finally, we address the problem of face recognition, which has achieved high accuracy by employing deep neural networks trained on massive labeled datasets. Further improvements through supervised learning require significantly larger datasets and hence massive annotation efforts. We improve upon the performance of face recognition models trained on large-scale labeled datasets by using unlabeled faces as additional training data. We present insights and recipes for training deep face recognition models with labeled and unlabeled data at scale, addressing real-world challenges such as overlapping identities between the labeled and unlabeled datasets, as well as label noise introduced by clustering errors.