SOTAVerified

Cost-Effective Language Driven Image Editing with LX-DRIM

2022-10-01MMMPIE (COLING) 2022Code Available0· sign in to hype

Rodrigo Santos, António Branco, João Ricardo Silva

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Cross-modal language and image processing is envisaged as a way to improve language understanding by resorting to visual grounding, but only recently, with the emergence of neural architectures specifically tailored to cope with both modalities, has it attracted increased attention and obtained promising results. In this paper we address a cross-modal task of language-driven image design, in particular the task of altering a given image on the basis of language instructions. We also avoid the need for a specifically tailored architecture and resort instead to a general purpose model in the Transformer family. Experiments with the resulting tool, LX-DRIM, show very encouraging results, confirming the viability of the approach for language-driven image design while keeping it affordable in terms of compute and data.

Tasks

Reproductions