arXiv 2410.09982

Self-Data Distillation for Recovering Quality in Pruned Large Language Models

By Vithursan Thangarasa, Ganesh Venkatesh, et al.

Published 2024-10-13

Citation lineage

Review the prior work and downstream research connected to this paper.

Large language models have driven significant progress in natural language processing, but their deployment requires substantial compute and memory resources. As models scale, compression techniques become essential for balancing model quality with computational efficiency. Structured pruning, which removes less critical components of the model, is a promising strategy for reducing complexity. However, one-shot prun…

View the original paper on arXiv