SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation
Ioannis Kazakos, Carles Ventura, Miriam Bellver, Carina Silberer, Xavier Giro-i-Nieto
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/imatge-upc/synthrefOfficialpytorch★ 2
- github.com/miriambellver/refvospytorch★ 28
Abstract
Recent advances in deep learning have brought significant progress in visual grounding tasks such as language-guided video object segmentation. However, collecting large datasets for these tasks is expensive in terms of annotation time, which represents a bottleneck. To this end, we propose a novel method, namely SynthRef, for generating synthetic referring expressions for target objects in an image (or video frame), and we also present and disseminate the first large-scale dataset with synthetic referring expressions for video object segmentation. Our experiments demonstrate that by training with our synthetic referring expressions one can improve the ability of a model to generalize across different datasets, without any additional annotation cost. Moreover, our formulation allows its application to any object detection or segmentation dataset.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| DAVIS 2017 (val) | RefVOS + SynthRef-YouTube-VIS | J&F 1st frame | 45.3 | — | Unverified |
| Refer-YouTube-VOS | RefVOS-Human REs | Mean IoU | 39.5 | — | Unverified |
| Refer-YouTube-VOS | RefVOS-Synthetic REs | Mean IoU | 35 | — | Unverified |