Can LM-Cocktail Enhance Language Models by Merging Them?

Original title: LM-Cocktail: Resilient Tuning of Language Models via Model Merging

Authors: Shitao Xiao, Zheng Liu, Peitian Zhang, Xingrun Xing

In this article, the focus is on improving fine-tuned language models to maintain performance across various tasks, not just specific ones. They introduce LM-Cocktail, a method that merges the fine-tuned model with a base model or models from different domains using weighted averaging. Despite its simplicity, LM-Cocktail proves highly effective: the resulting model performs strongly across general tasks while excelling in its targeted domain. Experimenting with LLama and BGE models on popular benchmarks like FLAN, MMLU, and MTEB, they confirm the method’s efficacy. The article provides code and checkpoints for reference, offering a promising way to enhance language models for broader applications without sacrificing performance in specialized areas.

Original article: https://arxiv.org/abs/2311.13534