MGDA Converges under Generalized Smoothness, Provably

2024-05-29Unverified0· sign in to hype

Qi Zhang, Peiyao Xiao, Shaofeng Zou, Kaiyi Ji

Unverified — Be the first to reproduce this paper.

Abstract

Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard L-smooth or bounded-gradient assumptions, which typically do not hold for neural networks, such as Long short-term memory (LSTM) models and Transformers. In this paper, we study a more general and realistic class of generalized -smooth loss functions, where is a general non-decreasing function of gradient norm. We revisit and analyze the fundamental multiple gradient descent algorithm (MGDA) and its stochastic version with double sampling for solving the generalized -smooth MOO problems, which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of these algorithms and show that they converge to an -accurate Pareto stationary point with a guaranteed -level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally O(^-2) and O(^-4) samples are needed for deterministic and stochastic settings, respectively. We prove that they can also guarantee a tighter -level CA distance in each iteration using more samples. Moreover, we analyze an efficient variant of MGDA named MGDA-FA using only O(1) time and space, while achieving the same performance guarantee as MGDA.

Tasks

Multi-Task Learning

MGDA Converges under Generalized Smoothness, Provably

Abstract

Tasks

Reproductions