A Composability-Based Transformer Pruning Framework

Lin, Yuping

Publication

A Composability-Based Transformer Pruning Framework

Lin, Yuping

Abstract

This thesis addresses the crucial issue of deploying large Transformer models on resource-constrained edge devices. Given the slow training speeds, the current Transformer pruning and fine-tuning process becomes tedious and time-consuming for multiple pruning configurations. To remedy this, the research proposes a novel composability-based Transformer pruning framework, aiming to significantly reduce the time required for fine-tuning pruned models across various configurations, while maintaining model performance. Unlike traditional approaches, this study explores the composability between Transformer pruning configurations, unveiling opportunities for computational reuse. It leverages this composability within a newly proposed framework, employing techniques similar to knowledge distillation and automating the process of pruning and fine-tuning. The framework demonstrated its ability to speed up the response time needed to fine-tune a model based on the given pruning configuration, making it a practical tool for real-world deployment on edge devices. The outcome of this research is a novel method that opens up a fresh perspective on Transformer model compression, offering a reference for future studies on pruning and fine-tuning of Transformer networks.

Type

Thesis (Open Access)

Date

2023-09

A Composability-Based Transformer Pruning Framework

Lin, Yuping

Citations

Abstract

Type

Date

Publisher

Degree

Advisors

License

License

Files

Research Projects

Organizational Units

Journal Issue

Embargo Lift Date

URI

DOI

Publisher Version

Embedded videos

Collections

Related Item(s)