Out-of-distribution generalization of internal models is correlated with reward

2021-03-09ICLR Workshop SSL-RL 2021Unverified0· sign in to hype

Khushdeep Singh Mann, Steffen Schneider, Alberto Chiappa, Jin Hwa Lee, Matthias Bethge, Alexander Mathis, Mackenzie W Mathis

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We investigate the behavior of reinforcement learning (RL) agents under morphological distribution shifts. Similar to recent robustness benchmarks in computer vision, we train algorithms on selected RL environments and test transfer performance on perturbed environments. We specifically test perturbations to popular RL agent's morphologies by changing the length and mass of limbs, which in biological settings is a major challenge (e.g., after injury or during growth). In this setup, called PyBullet-M, we compare the performance of policies obtained by reward-driven learning with self-supervised models of the observed state-action transitions. We find that out-of-distribution performance of self-supervised models is correlated to degradation in reward.

Tasks

Out-of-Distribution Generalization reinforcement-learning Reinforcement Learning (RL)

Out-of-distribution generalization of internal models is correlated with reward

Abstract

Tasks

Reproductions