arXiv 2507.11005
AdaMuon: Adaptive Muon Optimizer
By Chongjie Si, Debing Zhang, et al.
Published 2025-07-15
Discussion
Read the public discussion and references gathered around this paper.
We propose AdaMuon, a novel optimizer that combines element-wise adaptivity with orthogonal updates for large-scale neural network training. AdaMuon incorporates two tightly coupled mechanisms: (1) an element-wise second momentum estimator applied to orthogonalized update directions, and (2) a sign-stabilized orthogonal update, where the momentum is first sign-transformed before orthogonalization. These two componen…