arXiv 2506.22638

Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training

By Aadim Nepal, Safal Shrestha, et al.

Published 2025-06-27

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large language models improve at math after instruction tuning, reinforcement learning, or knowledge distillation. We ask whether these gains come from major changes in the transformer layers or from smaller adjustments that keep the original structure. Using layer-wise ablation on base and trained variants, we find that math reasoning depends on a few critical layers, which stay important across all post-training m…

View the original paper on arXiv