arXiv 2511.13254

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

By Shalini Maiti, Amar Budhiraja, et al.

Published 2025-11-17

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their training remains resource- and time-intensive, requiring massive compute power and careful orchestration of training procedures. Model souping-the practice of averaging weights from multiple models of the same architecture-has emerged as a promising pre- and post-training technique that can enhance performance wi…

View the original paper on arXiv