arXiv 2306.00978

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

By Ji Lin, Jiaming Tang, et al.

Published 2023-06-01

Citation lineage

Review the prior work and downstream research connected to this paper.

Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce the cloud computing cost and protect users' privacy. However, the astronomical model size and the limited hardware resource pose significant deployment challenges. We propose Activation-aware Weight Quantization (AWQ), a hardware-friendly approach f…

View the original paper on arXiv