arXiv 2410.09982

Self-Data Distillation for Recovering Quality in Pruned Large Language Models

By Vithursan Thangarasa, Ganesh Venkatesh, et al.

Published 2024-10-13

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large language models have driven significant progress in natural language processing, but their deployment requires substantial compute and memory resources. As models scale, compression techniques become essential for balancing model quality with computational efficiency. Structured pruning, which removes less critical components of the model, is a promising strategy for reducing complexity. However, one-shot prun…

View the original paper on arXiv