Can ComPEFT Improve Communication Efficiency with Sparse Updates?

Original title: ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

Authors: Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal

In this article, the focus is on making language models adaptable for specialized tasks using parameter-efficient fine-tuning (PEFT). While PEFT methods efficiently create specialized ‘expert’ models, their size poses challenges for network retrieval and GPU efficiency. Enter ComPEFT: a method that compresses fine-tuning residuals (task vectors) without retraining, using sparsification and ternary quantization. This compression drastically reduces model sizes (from 8x to 50x) across various model scales (200M to 65B parameters). Notably, ComPEFT surpasses other methods like QLoRA by 4.16% in performance while reducing storage size by up to 26x. These compressed ‘experts’ maintain strong task adaptability, aid in efficient communication and computation, and improve performance when merged. The article includes thorough analysis, comparisons with other methods, and evaluations showcasing ComPEFT’s effectiveness in compressing full-finetuning residuals, providing their code for public use.

Original article: https://arxiv.org/abs/2311.13171