arXiv 2511.13254
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance
By Shalini Maiti, Amar Budhiraja, et al.
Published 2025-11-17
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their training remains resource- and time-intensive, requiring massive compute power and careful orchestration of training procedures. Model souping-the practice of averaging weights from multiple models of the same architecture-has emerged as a promising pre- and post-training technique that can enhance performance wi…