Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier


Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded


Month Degree Awarded


First Advisor

Andrew McCallum

Subject Categories

Artificial Intelligence and Robotics


Humans show a remarkable capability to accurately solve a wide range of problems efficiently -- utilizing a limited amount of computation and experience. Deep learning models, by stark contrast, can be trained to be highly accurate on a narrow task while being highly inefficient in terms of the amount of compute and data required to reach that accuracy. Within natural language processing (NLP), recent breakthroughs in unsupervised pretraining have enabled reusable models that can be applied to many NLP tasks, however, learning of new tasks is still inefficient. This has led to research on few-shot learning, where the goal is to generalize to new tasks with very few labeled instances. Meta-learning, or learning to learn, treats the learning process itself as a learning problem from data with the goal of learning systems that can generalize to new tasks efficiently. This has the potential to produce few-shot learners that can accurately solve a wide range of new tasks. However, meta-learning requires a distribution over tasks with relevant labeled data that can be difficult to obtain, severely limiting the practical utility of meta-learning methods. In this dissertation, we develop methods to enable large-scale meta-learning from unlabeled text data and improve the few-shot generalization ability of NLP models. We contribute methods that propose tasks synthetically created from unlabeled text, allowing for a large task distribution for meta-learning. This leads to rapid learning of new tasks by meta-learning from millions of self-supervised tasks and minimizes the train-test mismatch in few-shot learning by optimizing the pre-training directly for future fine-tuning with a few examples. Since real-world applications of NLP require learning diverse tasks with different numbers of classes, we first introduce an optimization-based meta-learning method that can learn from multiple NLP classification tasks with any number of classes. We then leverage the proposed self-supervised approach to create meta-training tasks, with a diverse number of classes, and meta-train models for few-shot learning using this task distribution. This leads to better representation learning, learning key hyper-parameters like learning rates, can be combined with supervised tasks to regularize supervised meta-learning, and leads to accurate few-shot learning on a diverse set of NLP classification tasks. We further explore the space of self-supervised tasks for meta-learning by considering important aspects like task diversity, difficulty, type, domain, and curriculum, and investigate how they affect meta-learning performance. Our analysis shows that all these factors meaningfully alter the task distribution, some inducing significant improvements in downstream few-shot accuracy of the meta-learned models. Our findings yield accurate and efficient meta-learning methods that improve few-shot generalization to diverse tasks and should enable many future applications of meta-learning in NLP, such as hyper-parameter optimization, continual learning, efficient learning, learning in low-resource languages, and more.


Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.