Multi-Timescale Gradient Sliding for Distributed Optimization

2025-06-18Unverified0· sign in to hype

Junhui Zhang, Patrick Jaillet

Unverified — Be the first to reproduce this paper.

Abstract

We propose two first-order methods for convex, non-smooth, distributed optimization problems, hereafter called Multi-Timescale Gradient Sliding (MT-GS) and its accelerated variant (AMT-GS). Our MT-GS and AMT-GS can take advantage of similarities between (local) objectives to reduce the communication rounds, are flexible so that different subsets (of agents) can communicate at different, user-picked rates, and are fully deterministic. These three desirable features are achieved through a block-decomposable primal-dual formulation, and a multi-timescale variant of the sliding method introduced in Lan et al. (2020), Lan (2016), where different dual blocks are updated at potentially different rates. To find an -suboptimal solution, the complexities of our algorithms achieve optimal dependency on : MT-GS needs O(rA/) communication rounds and O(r/^2) subgradient steps for Lipchitz objectives, and AMT-GS needs O(rA/) communication rounds and O(r/()) subgradient steps if the objectives are also -strongly convex. Here, r measures the ``average rate of updates'' for dual blocks, and A measures similarities between (subgradients of) local functions. In addition, the linear dependency of communication rounds on A is optimal (Arjevani and Shamir 2015), thereby providing a positive answer to the open question whether such dependency is achievable for non-smooth objectives (Arjevani and Shamir 2015).

Tasks

Distributed Optimization

Multi-Timescale Gradient Sliding for Distributed Optimization

Abstract

Tasks

Reproductions