arXiv 2305.14314

QLoRA: Efficient Finetuning of Quantized LLMs

By Tim Dettmers, Artidoro Pagnoni, et al.

Published 2023-05-23

Citation lineage

Review the prior work and downstream research connected to this paper.

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the…

View the original paper on arXiv