SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 101125 of 671 papers

TitleStatusHype
CLIP2Video: Mastering Video-Text Retrieval via Image CLIPCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
CLIP-Lite: Information Efficient Visual Representation Learning with Language SupervisionCode1
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneCode1
Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents IntegrationCode1
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image RecognitionCode1
COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust LearningCode1
A Survey of Medical Vision-and-Language Applications and Their TechniquesCode1
Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency BenefitsCode1
ComCLIP: Training-Free Compositional Image and Text MatchingCode1
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Composing Object Relations and Attributes for Image-Text MatchingCode1
Eye-gaze Guided Multi-modal Alignment for Medical Representation LearningCode1
Consensus-Aware Visual-Semantic Embedding for Image-Text MatchingCode1
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and ReportsCode1
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetCode1
Generative Multi-hop RetrievalCode1
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
Nonparametric Decoding for Generative RetrievalCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
A Deep Local and Global Scene-Graph Matching for Image-Text RetrievalCode1
Show:102550
← PrevPage 5 of 27Next →

No leaderboard results yet.