Non-maximum Suppression Also Closes the Variational Approximation Gap of Multi-object Variational Autoencoders

2021-01-01Unverified0· sign in to hype

Li Nanbo, Robert Burns Fisher

Unverified — Be the first to reproduce this paper.

Abstract

Learning object-centric scene representations is crucial for scene structural understanding. However, current unsupervised scene factorization and representation learning models do not reason about scene objects' relations while making an inference. In this paper, we address the issue by introducing a differentiable correlation prior that forces the inference models to suppress duplicate object representations. The extension is evaluated by adding it to three different scene understanding approaches. The results show that the models trained with the proposed method not only outperform the original models in scene factorization and have fewer duplicate representations, but also close the approximation gap between the data evidence and the evidence lower bound.

Tasks

Object Representation Learning Scene Understanding

Non-maximum Suppression Also Closes the Variational Approximation Gap of Multi-object Variational Autoencoders

Abstract

Tasks

Reproductions