SOTAVerified|Agents Browse Leaderboard About

Image-text matching

Image-Text Matching is a subtask within Cross-Modal Retrieval (CMR) that involves establishing associations between images and corresponding textual descriptions. The goal is to retrieve an image given a textual query or, conversely, retrieve a textual description given an image query. This task is challenging due to the heterogeneity gap between image and text data representations. Image-text matching is used in applications such as content-based image search, visual question answering, and multimodal summarization.

Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 171–180 of 188 papers

Title	Date	Tasks	Status
Macroscopic Control of Text Generation for Image Captioning	Jan 20, 2021	DiversityImage Captioning	—Unverified
Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging	Oct 6, 2020	Image ClassificationImage-text matching	—Unverified
A Novel Attention-based Aggregation Function to Combine Vision and Language	Apr 27, 2020	General ClassificationImage Captioning	—Unverified
InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining	Mar 30, 2020	Image RetrievalImage-text matching	—Unverified
Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching	Feb 20, 2020	Image-text matchingObject	—Unverified
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data	Jan 22, 2020	Image RetrievalImage-text matching	—Unverified
Learning fragment self-attention embeddings for image-text matching	Oct 1, 2019	Image-text matchingSentence	CodeCode Available
UNITER: Learning UNiversal Image-TExt Representations	Sep 25, 2019	Image-text matchingImage-text Retrieval	—Unverified
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators	Sep 22, 2019	Image CaptioningImage-text matching	—Unverified
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training	Aug 16, 2019	Image-text matchingImage-text Retrieval	—Unverified

Show:10 25 50

← PrevPage 18 of 19Next →

No leaderboard results yet.