On the Performance Analysis of Momentum Method: A Frequency Domain Perspective
Xianliang Li, Jun Luo, Zhiwei Zheng, Hanxiao Wang, Li Luo, Lingkun Wen, Linlong Wu, Sheng Xu
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/yinleung/FSGDMpytorch★ 10
Abstract
Momentum-based optimizers are widely adopted for training neural networks. However, the optimal selection of momentum coefficients remains elusive. This uncertainty impedes a clear understanding of the role of momentum in stochastic gradient methods. In this paper, we present a frequency domain analysis framework that interprets the momentum method as a time-variant filter for gradients, where adjustments to momentum coefficients modify the filter characteristics. Our experiments support this perspective and provide a deeper understanding of the mechanism involved. Moreover, our analysis reveals the following significant findings: high-frequency gradient components are undesired in the late stages of training; preserving the original gradient in the early stages, and gradually amplifying low-frequency gradient components during training both enhance performance. Based on these insights, we propose Frequency Stochastic Gradient Descent with Momentum (FSGDM), a heuristic optimizer that dynamically adjusts the momentum filtering characteristic with an empirically effective dynamic magnitude response. Experimental results demonstrate the superiority of FSGDM over conventional momentum optimizers.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| CIFAR-10 | ResNet18 (FSGDM) | Percentage correct | 95.66 | — | Unverified |
| CIFAR-100 | ResNet50 (FSGDM) | Percentage correct | 81.44 | — | Unverified |
| ImageNet | ResNet50 (FSGDM) | Top 1 Accuracy | 76.91 | — | Unverified |
| ImageNet | ResNet34 (FSGDM) | Top 1 Accuracy | 67.74 | — | Unverified |