Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.
Author ORCID Identifier
https://orcid.org/0009-0005-6851-3093
AccessType
Open Access Dissertation
Document Type
dissertation
Degree Name
Doctor of Philosophy (PhD)
Degree Program
Computer Science
Year Degree Awarded
2023
Month Degree Awarded
September
First Advisor
Mohit Iyyer
Second Advisor
Subhransu Maji
Third Advisor
Hamed Zamani
Fourth Advisor
Thang Luong
Fifth Advisor
Colin Raffel
Subject Categories
Artificial Intelligence and Robotics
Abstract
Substantial progress has been made in the field of natural language processing (NLP) due to the advent of large language models (LLMs)—deep neural networks with millions or billions of parameters pre-trained on large amounts of unlabeled data. However, these models have common weaknesses, including degenerate performance in data-scarce scenarios, and substantial computational resource requirements. This thesis aims to develop methods to address these limitations for improved applicability and performance of LLMs in resource-constrained settings with limited data and/or computational resources.
To address the need for labeled data in data-scarce scenarios, I present two methods, in Chapter 2 and Chapter 3, respectively. The first method leverages beneficial relationships between NLP tasks for transfer learning, while the second method combines data augmentation and self-training to boost few-shot learning performance—the ability to perform novel tasks from only a few labeled examples. Additionally, in Chapter 4, I introduce a novel parameter-efficient transfer learning approach that reuses a single frozen model for all tasks while only learning minimal task-specific parameters (soft/continuous prompts) to represent tasks and transfer knowledge. Our method can match or outperform fine-tuning task-specific models (training the whole model on each task). In Chapter 5, I demonstrate the benefits of parameter-efficient transfer learning in a cross-lingual transfer setting. Finally, I conclude the thesis in Chapter 6 by outlining potential avenues for future research that aim to advance NLP through large-scale multi-task learning using multilingual and multimodal data.
DOI
https://doi.org/10.7275/36003057
Recommended Citation
Vu, Tu, "Effective and Efficient Transfer Learning in the Era of Large Language Models" (2023). Doctoral Dissertations. 2914.
https://doi.org/10.7275/36003057
https://scholarworks.umass.edu/dissertations_2/2914
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.