arXiv 2511.07885

Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

By Jon Saad-Falcon, Avanika Narayan, et al.

Published 2025-11-11

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to scale infrastructure at pace. Two advances enable us to rethink this paradigm: small LMs (<=20B active parameters) now achieve competitive performance to frontier models on many tasks, and local accelerators (e.g., Apple M…

View the original paper on arXiv