Sparse Mixture-of-Experts are Domain Generalizable Learners

2022-06-08Code Available1· sign in to hype

Bo Li, Yifei Shen, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu

Code Available — Be the first to reproduce this paper.

Code

github.com/luodian/sf-moe-dg
OfficialIn paperpytorch★ 0
github.com/KU-CVLAB/MoA
pytorch★ 31

Abstract

Human visual perception can easily generalize to out-of-distributed visual data, which is far beyond the capability of modern machine learning models. Domain generalization (DG) aims to close this gap, with existing DG methods mainly focusing on the loss function design. In this paper, we propose to explore an orthogonal direction, i.e., the design of the backbone architecture. It is motivated by an empirical finding that transformer-based models trained with empirical risk minimization (ERM) outperform CNN-based models employing state-of-the-art (SOTA) DG algorithms on multiple DG datasets. We develop a formal framework to characterize a network's robustness to distribution shifts by studying its architecture's alignment with the correlations in the dataset. This analysis guides us to propose a novel DG model built upon vision transformers, namely Generalizable Mixture-of-Experts (GMoE). Extensive experiments on DomainBed demonstrate that GMoE trained with ERM outperforms SOTA DG baselines by a large margin. Moreover, GMoE is complementary to existing DG methods and its performance is substantially improved when trained with DG algorithms.

Tasks

Domain Generalization Mixture-of-Experts Object Recognition

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
DomainNet	Hybrid-SF-MoE	Average Accuracy	52	—	Unverified
DomainNet	GMoE-S/16	Average Accuracy	48.7	—	Unverified
Office-Home	GMoE-S/16	Average Accuracy	74.2	—	Unverified
PACS	GMoE-S/16	Average Accuracy	88.1	—	Unverified
TerraIncognita	GMoE-S/16	Average Accuracy	48.5	—	Unverified
VLCS	GMoE-S/16	Average Accuracy	80.2	—	Unverified

Sparse Mixture-of-Experts are Domain Generalizable Learners

Code

Abstract

Tasks

Benchmark Results

Reproductions