arXiv 2511.07885

Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

By Jon Saad-Falcon, Avanika Narayan, et al.

Published 2025-11-11

Citation lineage

Review the prior work and downstream research connected to this paper.

Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to scale infrastructure at pace. Two advances enable us to rethink this paradigm: small LMs (<=20B active parameters) now achieve competitive performance to frontier models on many tasks, and local accelerators (e.g., Apple M…

View the original paper on arXiv