arXiv 2212.09720
The case for 4-bit precision: k-bit Inference Scaling Laws
By Tim Dettmers and Luke Zettlemoyer
Published 2022-12-19
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Quantization methods reduce the number of bits required to represent each parameter in a model, trading accuracy for smaller memory footprints and inference latencies. However, the final model size depends on both the number of parameters of the original model and the rate of compression. For example, a 30B 8-bit model and a 60B 4-bit model have the same number of bits but may have very different zero-shot accuracie…