arXiv 2306.00978
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
By Ji Lin, Jiaming Tang, et al.
Published 2023-06-01
Citation lineage
Review the prior work and downstream research connected to this paper.
Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce the cloud computing cost and protect users' privacy. However, the astronomical model size and the limited hardware resource pose significant deployment challenges. We propose Activation-aware Weight Quantization (AWQ), a hardware-friendly approach f…