Decoding Language Spatial Relations to 2D Spatial Arrangements
Gorjan Radevski, Guillem Collell, Marie-Francine Moens, Tinne Tuytelaars
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/gorjanradevski/sr-bertOfficialIn paperpytorch★ 11
Abstract
We address the problem of multimodal spatial understanding by decoding a set of language-expressed spatial relations to a set of 2D spatial arrangements in a multi-object and multi-relationship setting. We frame the task as arranging a scene of clip-arts given a textual description. We propose a simple and effective model architecture Spatial-Reasoning Bert (SR-Bert), trained to decode text to 2D spatial arrangements in a non-autoregressive manner. SR-Bert can decode both explicit and implicit language to 2D spatial arrangements, generalizes to out-of-sample data to a reasonable extent and can generate complete abstract scenes if paired with a clip-arts predictor. Finally, we qualitatively evaluate our method with a user study, validating that our generated spatial arrangements align with human expectation.