On the Performance Analysis of Momentum Method: A Frequency Domain Perspective

2024-11-29Code Available1· sign in to hype

Xianliang Li, Jun Luo, Zhiwei Zheng, Hanxiao Wang, Li Luo, Lingkun Wen, Linlong Wu, Sheng Xu

Code Available — Be the first to reproduce this paper.

Code

github.com/yinleung/FSGDM
pytorch★ 10

Abstract

Momentum-based optimizers are widely adopted for training neural networks. However, the optimal selection of momentum coefficients remains elusive. This uncertainty impedes a clear understanding of the role of momentum in stochastic gradient methods. In this paper, we present a frequency domain analysis framework that interprets the momentum method as a time-variant filter for gradients, where adjustments to momentum coefficients modify the filter characteristics. Our experiments support this perspective and provide a deeper understanding of the mechanism involved. Moreover, our analysis reveals the following significant findings: high-frequency gradient components are undesired in the late stages of training; preserving the original gradient in the early stages, and gradually amplifying low-frequency gradient components during training both enhance performance. Based on these insights, we propose Frequency Stochastic Gradient Descent with Momentum (FSGDM), a heuristic optimizer that dynamically adjusts the momentum filtering characteristic with an empirically effective dynamic magnitude response. Experimental results demonstrate the superiority of FSGDM over conventional momentum optimizers.

Tasks

Image Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CIFAR-10	ResNet18 (FSGDM)	Percentage correct	95.66	—	Unverified
CIFAR-100	ResNet50 (FSGDM)	Percentage correct	81.44	—	Unverified
ImageNet	ResNet50 (FSGDM)	Top 1 Accuracy	76.91	—	Unverified
ImageNet	ResNet34 (FSGDM)	Top 1 Accuracy	67.74	—	Unverified

On the Performance Analysis of Momentum Method: A Frequency Domain Perspective

Code

Abstract

Tasks

Benchmark Results

Reproductions