SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 501550 of 671 papers

TitleStatusHype
Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text RetrievalCode1
An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote SensingCode0
Vision-Language Pre-Training with Triple Contrastive LearningCode2
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval0
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training BenchmarkCode0
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation ModelsCode3
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationCode5
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval0
Cross-modal Contrastive Learning for Speech Translation0
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
CLIP-Lite: Information Efficient Visual Representation Learning with Language SupervisionCode1
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation0
Densifying Sparse Representations for Passage Retrieval by Representational SlicingCode1
Video-Text Pre-training with Learned RegionsCode1
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning0
Constructing Phrase-level Semantic Labels to Form Multi-GrainedSupervision for Image-Text Retrieval0
ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named EntitiesCode0
SwAMP: Swapped Assignment of Multi-Modal Pairs for Cross-Modal Retrieval0
CLIP2TV: Align, Match and Distill for Video-Text Retrieval0
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text RetrievalCode0
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsCode1
Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak DecoderCode1
Deep Keyphrase Completion0
Dense Hierarchical Retrieval for Open-Domain Question AnsweringCode1
Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations0
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation0
Adversarial Retriever-Ranker for dense text retrieval0
A Proposed Conceptual Framework for a Representational Approach to Information Retrieval0
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations0
Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representation0
Learning Context-Adapted Video-Text Retrieval by Attending to User Comments0
Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval0
EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling0
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
Text Retrieval for Language Learners: Graded Vocabulary vs. Open Learner Model0
In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval0
Multimodal or Text? Retrieval or BERT? Benchmarking Classifiers for the Shared Task on Hateful Memes0
HANet: Hierarchical Alignment Networks for Video-Text RetrievalCode1
Multi-stage Pre-training over Simplified Multimodal Pre-training ModelsCode0
WikiGraphs: A Wikipedia Text - Knowledge Graph Paired DatasetCode0
More Robust Dense Retrieval with Contrastive Dual LearningCode1
Align before Fuse: Vision and Language Representation Learning with Momentum DistillationCode1
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training0
CLIP2Video: Mastering Video-Text Retrieval via Image CLIPCode1
CoSMo: Content-Style Modulation for Image Retrieval With Text FeedbackCode1
Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval0
Show:102550
← PrevPage 11 of 14Next →

No leaderboard results yet.