SOTAVerified

A Survey on Bridging VLMs and Synthetic Data

2025-05-09OpenReview 2025Code Available1· sign in to hype

Mohammad Ghiasvand Mohammadkhani, Saeedeh Momtazi, Hamid Beigy

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Vision-language models (VLMs) have significantly advanced multimodal AI by learning joint representations of visual and textual data. However, their progress is hindered by challenges in acquiring high-quality, aligned datasets, including issues of cost, privacy, and scarcity. On the other hand, synthetic data, created through the use of generative AI—which can even include VLMs—offers a scalable and cost-effective solution to these challenges. This paper presents the first comprehensive survey on bridging VLMs and synthetic data, exploring both the role of synthetic data in VLMs and the role of VLMs in synthetic data. First, we provide a preliminary overview by briefly explaining the architecture of two basic VLMs and, after studying a large number of previous works, offer an extensive survey of the previously proposed methodologies and potential future directions in this area.

Tasks

Reproductions