arXiv 2306.00978
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
By Ji Lin, Jiaming Tang, et al.
Published 2023-06-01
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce the cloud computing cost and protect users' privacy. However, the astronomical model size and the limited hardware resource pose significant deployment challenges. We propose Activation-aware Weight Quantization (AWQ), a hardware-friendly approach f…