arXiv 2507.11005

AdaMuon: Adaptive Muon Optimizer

By Chongjie Si, Debing Zhang, et al.

Published 2025-07-15

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We propose AdaMuon, a novel optimizer that combines element-wise adaptivity with orthogonal updates for large-scale neural network training. AdaMuon incorporates two tightly coupled mechanisms: (1) an element-wise second momentum estimator applied to orthogonalized update directions, and (2) a sign-stabilized orthogonal update, where the momentum is first sign-transformed before orthogonalization. These two componen…

View the original paper on arXiv