MGDA Converges under Generalized Smoothness, Provably
Qi Zhang, Peiyao Xiao, Shaofeng Zou, Kaiyi Ji
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard L-smooth or bounded-gradient assumptions, which typically do not hold for neural networks, such as Long short-term memory (LSTM) models and Transformers. In this paper, we study a more general and realistic class of generalized -smooth loss functions, where is a general non-decreasing function of gradient norm. We revisit and analyze the fundamental multiple gradient descent algorithm (MGDA) and its stochastic version with double sampling for solving the generalized -smooth MOO problems, which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of these algorithms and show that they converge to an -accurate Pareto stationary point with a guaranteed -level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally O(^-2) and O(^-4) samples are needed for deterministic and stochastic settings, respectively. We prove that they can also guarantee a tighter -level CA distance in each iteration using more samples. Moreover, we analyze an efficient variant of MGDA named MGDA-FA using only O(1) time and space, while achieving the same performance guarantee as MGDA.