Byzantine-Robust Optimization under (L_0, L_1)-Smoothness
Arman Bolatov, Samuel Horváth, Martin Takáč, Eduard Gorbunov
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We consider distributed optimization under Byzantine attacks in the presence of (L_0,L_1)-smoothness, a generalization of standard L-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose Byz-NSGDM, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by (L_0,L_1)-smoothness and Byzantine adversaries. We prove that Byz-NSGDM achieves a convergence rate of O(K^-1/4) up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification, synthetic (L_0,L_1)-smooth optimization, and character-level language modeling with a small GPT model demonstrates the effectiveness of our approach against various Byzantine attack strategies. An ablation study further shows that Byz-NSGDM is robust across a wide range of momentum and learning rate choices.