Less Suboptimal Learning and Control in Variational POMDPs

2021-03-09ICLR Workshop SSL-RL 2021Unverified0· sign in to hype

Baris Kayalibay, Atanas Mirchev, Patrick van der Smagt, Justin Bayer

Unverified — Be the first to reproduce this paper.

Abstract

A recently uncovered pitfall in learning generative models with amortised variational inference, the conditioning gap, questions common practices in model-based reinforcement learning. Withholding a part of the quantities that the true posterior depends on from the inference network leads to a biased generative model and an approximate posterior that underestimates uncertainty. We examine the effect of the conditioning gap on model-based reinforcement learning with variational world models. We study the effect in three settings with known dynamics, which enables us to compare to a near-optimal policy. Our finding is that the impact of the conditioning gap becomes severe in systems where the state is hard to estimate.

Tasks

Model-based Reinforcement Learning reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)Variational Inference

Less Suboptimal Learning and Control in Variational POMDPs

Abstract

Tasks

Reproductions