SOTAVerified

Cross-Modal Information Retrieval

Cross-Modal Information Retrieval (CMIR) is the task of finding relevant items across different modalities. For example, given an image, find a text or vice versa. The main challenge in CMIR is known as the heterogeneity gap: since items from different modalities have different data types, the similarity between them cannot be measured directly. Therefore, the majority of CMIR methods published to date attempt to bridge this gap by learning a latent representation space, where the similarity between items from different modalities can be measured.

Source: Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study

Papers

Showing 116 of 16 papers

TitleStatusHype
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer EncodersCode1
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsCode1
ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual DescriptionsCode0
CMIR-NET : A Deep Learning Based Model For Cross-Modal Retrieval In Remote SensingCode0
Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information RetrievalCode0
Picture It In Your Mind: Generating High Level Visual Representations From Textual DescriptionsCode0
Cross-modal representation alignment of molecular structure and perturbation-induced transcriptional profilesCode0
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval0
Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval0
Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering0
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions0
Multimodal Representation Alignment for Cross-modal Information Retrieval0
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images0
Is Cross-modal Information Retrieval Possible without Training?0
LILE: Look In-Depth before Looking Elsewhere -- A Dual Attention Network using Transformers for Cross-Modal Information Retrieval in Histopathology Archives0
Show:102550

No leaderboard results yet.