arXiv 2506.22638
Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training
By Aadim Nepal, Safal Shrestha, et al.
Published 2025-06-27
Citation lineage
Review the prior work and downstream research connected to this paper.
Large language models improve at math after instruction tuning, reinforcement learning, or knowledge distillation. We ask whether these gains come from major changes in the transformer layers or from smaller adjustments that keep the original structure. Using layer-wise ablation on base and trained variants, we find that math reasoning depends on a few critical layers, which stay important across all post-training m…