SOTAVerified|Agents Browse Leaderboard About

Image-text matching

Image-Text Matching is a subtask within Cross-Modal Retrieval (CMR) that involves establishing associations between images and corresponding textual descriptions. The goal is to retrieve an image given a textual query or, conversely, retrieve a textual description given an image query. This task is challenging due to the heterogeneity gap between image and text data representations. Image-text matching is used in applications such as content-based image search, visual question answering, and multimodal summarization.

Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 71–80 of 188 papers

Title	Date	Tasks	Status	Hype
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models	Aug 18, 2023	Image-text matchingObject Localization	—Unverified	0
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination	Aug 8, 2023	Image-text matchingRepresentation Learning	CodeCode Available	1
Grounded Image Text Matching with Mismatched Relation Reasoning	Aug 2, 2023	Image-text matchingRelation	—Unverified	0
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models	Jul 24, 2023	Image GenerationImage-text matching	CodeCode Available	2
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method	Jul 21, 2023	Image-text matchingText Matching	CodeCode Available	1
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding	Jul 3, 2023	Image-text matchingSentence	CodeCode Available	1
Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark	Jun 5, 2023	AttributeImage-text matching	CodeCode Available	1
Revisiting the Role of Language Priors in Vision-Language Models	Jun 2, 2023	Image-text matchingImage-text Retrieval	CodeCode Available	1
Improved Probabilistic Image-Text Representations	May 29, 2023	Data AugmentationImage-text matching	CodeCode Available	1
Are Diffusion Models Vision-And-Language Reasoners?	May 25, 2023	DenoisingImage Generation	CodeCode Available	1

Show:10 25 50

← PrevPage 8 of 19Next →

No leaderboard results yet.