Representation Convergence: Mutual Distillation is Secretly a Form of Regularization
Zhengpeng Xie, Jiahang Cao, Qiang Zhang, Jianxiong Zhang, Changwei Wang, Renjing Xu
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/myrepositories-hub/mutual-distillation-policy-optimizationOfficialIn paperpytorch★ 1
Abstract
In this paper, we argue that mutual distillation between reinforcement learning policies serves as an implicit regularization, preventing them from overfitting to irrelevant features. We highlight two key contributions: (a) Theoretically, for the first time, we prove that enhancing the policy robustness to irrelevant features leads to improved generalization performance. (b) Empirically, we demonstrate that mutual distillation between policies contributes to such robustness, enabling the spontaneous emergence of invariant representations over pixel inputs. Overall, our findings challenge the conventional view of distillation as merely a means of knowledge transfer, offering a novel perspective on the generalization in deep reinforcement learning.