arXiv 2507.11005

AdaMuon: Adaptive Muon Optimizer

By Chongjie Si, Debing Zhang, et al.

Published 2025-07-15

Discussion

Read the public discussion and references gathered around this paper.

We propose AdaMuon, a novel optimizer that combines element-wise adaptivity with orthogonal updates for large-scale neural network training. AdaMuon incorporates two tightly coupled mechanisms: (1) an element-wise second momentum estimator applied to orthogonalized update directions, and (2) a sign-stabilized orthogonal update, where the momentum is first sign-transformed before orthogonalization. These two componen…

View the original paper on arXiv