Image-text matching

Image-Text Matching is a subtask within Cross-Modal Retrieval (CMR) that involves establishing associations between images and corresponding textual descriptions. The goal is to retrieve an image given a textual query or, conversely, retrieve a textual description given an image query. This task is challenging due to the heterogeneity gap between image and text data representations. Image-text matching is used in applications such as content-based image search, visual question answering, and multimodal summarization.

Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–188 of 188 papers

Title	Date	Tasks	Status	Hype
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training	Apr 1, 2021	Image-text matchingImage-text Retrieval	—Unverified	0
Macroscopic Control of Text Generation for Image Captioning	Jan 20, 2021	DiversityImage Captioning	—Unverified	0
Similarity Reasoning and Filtration for Image-Text Matching	Jan 5, 2021	Cross-Modal RetrievalImage Retrieval	CodeCode Available	1
VinVL: Revisiting Visual Representations in Vision-Language Models	Jan 2, 2021	Image CaptioningImage-text matching	CodeCode Available	2
Learning Dual Semantic Relations with Graph Attention for Image-Text Matching	Oct 22, 2020	Cross-Modal RetrievalGraph Attention	CodeCode Available	1
MedICaT: A Dataset of Medical Images, Captions, and Textual References	Oct 12, 2020	document understandingImage-text matching	CodeCode Available	1
Universal Weighting Metric Learning for Cross-Modal Matching	Oct 7, 2020	Image-text matchingMetric Learning	CodeCode Available	1
Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging	Oct 6, 2020	Image ClassificationImage-text matching	—Unverified	0
Consensus-Aware Visual-Semantic Embedding for Image-Text Matching	Jul 17, 2020	Image CaptioningImage-text matching	CodeCode Available	1
A Novel Attention-based Aggregation Function to Combine Vision and Language	Apr 27, 2020	General ClassificationImage Captioning	—Unverified	0
Deep Multimodal Neural Architecture Search	Apr 25, 2020	DecoderImage-text matching	CodeCode Available	1
Transformer Reasoning Network for Image-Text Matching and Retrieval	Apr 20, 2020	Image RetrievalImage-text matching	CodeCode Available	1
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks	Apr 13, 2020	Cross-Modal RetrievalImage Captioning	CodeCode Available	2
Text-Guided Neural Image Inpainting	Apr 7, 2020	DescriptiveImage Generation	CodeCode Available	1
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers	Apr 2, 2020	Image-text matchingImage-text Retrieval	CodeCode Available	1
More Grounded Image Captioning by Distilling Image-Text Matching Model	Apr 1, 2020	Image CaptioningImage-text matching	CodeCode Available	1
Graph Structured Network for Image-Text Matching	Apr 1, 2020	AttributeCross-Modal Retrieval	CodeCode Available	1
InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining	Mar 30, 2020	Image RetrievalImage-text matching	—Unverified	0
Adaptive Offline Quintuplet Loss for Image-Text Matching	Mar 7, 2020	Image-text matchingText Matching	CodeCode Available	1
Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching	Feb 20, 2020	Image-text matchingObject	—Unverified	0
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data	Jan 22, 2020	Image RetrievalImage-text matching	—Unverified	0
Learning fragment self-attention embeddings for image-text matching	Oct 1, 2019	Image-text matchingSentence	CodeCode Available	0
UNITER: Learning UNiversal Image-TExt Representations	Sep 25, 2019	Image-text matchingImage-text Retrieval	—Unverified	0
UNITER: UNiversal Image-TExt Representation Learning	Sep 25, 2019	Image-text matchingImage-text Retrieval	CodeCode Available	1
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators	Sep 22, 2019	Image CaptioningImage-text matching	—Unverified	0
Visual Semantic Reasoning for Image-Text Matching	Sep 6, 2019	Cross-Modal RetrievalImage Retrieval	CodeCode Available	1
VL-BERT: Pre-training of Generic Visual-Linguistic Representations	Aug 22, 2019	Image-text matchingLanguage Modelling	CodeCode Available	1
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training	Aug 16, 2019	Image-text matchingImage-text Retrieval	—Unverified	0
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking	Aug 12, 2019	Binary ClassificationGeneral Classification	CodeCode Available	0
Knowledge Aware Semantic Concept Expansion for Image-Text Matching	Aug 10, 2019	Common Sense ReasoningContent-Based Image Retrieval	—Unverified	0
Position Focused Attention Network for Image-Text Matching	Jul 23, 2019	Image-text matchingPosition	CodeCode Available	0
ParNet: Position-aware Aggregated Relation Network for Image-Text matching	Jun 17, 2019	Image-text matchingPosition	—Unverified	0
Deep Cross-Modal Projection Learning for Image-Text Matching	Sep 1, 2018	Cross-Modal RetrievalImage-text matching	CodeCode Available	0
Stacked Cross Attention for Image-Text Matching	Mar 21, 2018	Cross-Modal RetrievalImage Retrieval	CodeCode Available	1
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks	Nov 28, 2017	Generative Adversarial NetworkImage Generation	CodeCode Available	1
Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval	May 28, 2017	Cross-Modal RetrievalImage Retrieval	—Unverified	0
Learning Two-Branch Neural Networks for Image-Text Matching Tasks	Apr 11, 2017	Image-text matchingRetrieval	CodeCode Available	0
Dual Attention Networks for Multimodal Reasoning and Matching	Nov 2, 2016	Collaborative InferenceImage-text matching	CodeCode Available	0

Show:10 25 50

← PrevPage 4 of 4Next →

No leaderboard results yet.