arXiv 2507.11005
AdaMuon: Adaptive Muon Optimizer
By Chongjie Si, Debing Zhang, et al.
Published 2025-07-15
Citation lineage
Review the prior work and downstream research connected to this paper.
We propose AdaMuon, a novel optimizer that combines element-wise adaptivity with orthogonal updates for large-scale neural network training. AdaMuon incorporates two tightly coupled mechanisms: (1) an element-wise second momentum estimator applied to orthogonalized update directions, and (2) a sign-stabilized orthogonal update, where the momentum is first sign-transformed before orthogonalization. These two componen…