arXiv 2511.13254

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

By Shalini Maiti, Amar Budhiraja, et al.

Published 2025-11-17

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their training remains resource- and time-intensive, requiring massive compute power and careful orchestration of training procedures. Model souping-the practice of averaging weights from multiple models of the same architecture-has emerged as a promising pre- and post-training technique that can enhance performance wi…

View the original paper on arXiv