arXiv 2311.03099

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

By Le Yu, Bowen Yu, et al.

Published 2023-11-06

Discussion

Read the public discussion and references gathered around this paper.

In this paper, we unveil that Language Models (LMs) can acquire new capabilities by assimilating parameters from homologous models without retraining or GPUs. We first introduce DARE to set most delta parameters (i.e., the disparity between fine-tuned and pre-trained parameters) to zeros without affecting the abilities of Supervised Fine-Tuning (SFT) LMs, which randomly Drops delta parameters with a ratio And REscal…

View the original paper on arXiv