arXiv 2306.00978

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

By Ji Lin, Jiaming Tang, et al.

Published 2023-06-01

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce the cloud computing cost and protect users' privacy. However, the astronomical model size and the limited hardware resource pose significant deployment challenges. We propose Activation-aware Weight Quantization (AWQ), a hardware-friendly approach f…

View the original paper on arXiv