SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 501525 of 671 papers

TitleStatusHype
Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text RetrievalCode1
An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote SensingCode0
Vision-Language Pre-Training with Triple Contrastive LearningCode2
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval0
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training BenchmarkCode0
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation ModelsCode3
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationCode5
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval0
Cross-modal Contrastive Learning for Speech Translation0
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
CLIP-Lite: Information Efficient Visual Representation Learning with Language SupervisionCode1
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation0
Densifying Sparse Representations for Passage Retrieval by Representational SlicingCode1
Video-Text Pre-training with Learned RegionsCode1
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning0
Constructing Phrase-level Semantic Labels to Form Multi-GrainedSupervision for Image-Text Retrieval0
ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named EntitiesCode0
SwAMP: Swapped Assignment of Multi-Modal Pairs for Cross-Modal Retrieval0
CLIP2TV: Align, Match and Distill for Video-Text Retrieval0
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text RetrievalCode0
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsCode1
Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak DecoderCode1
Show:102550
← PrevPage 21 of 27Next →

No leaderboard results yet.