SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 101150 of 671 papers

TitleStatusHype
CLIP2Video: Mastering Video-Text Retrieval via Image CLIPCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
FETA: Towards Specializing Foundation Models for Expert Task ApplicationsCode1
A Comprehensive Review of the Video-to-Text ProblemCode1
Extending Multi-modal Contrastive RepresentationsCode1
Eye-gaze Guided Multi-modal Alignment for Medical Representation LearningCode1
Bridging Language Gaps in Audio-Text RetrievalCode1
Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents IntegrationCode1
Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency BenefitsCode1
COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust LearningCode1
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text RetrievalCode1
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based SearchCode1
ComCLIP: Training-Free Compositional Image and Text MatchingCode1
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Composing Object Relations and Attributes for Image-Text MatchingCode1
Learning Semantic Relationship Among Instances for Image-Text MatchingCode1
Consensus-Aware Visual-Semantic Embedding for Image-Text MatchingCode1
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
ESA: External Space Attention Aggregation for Image-Text RetrievalCode1
A Deep Local and Global Scene-Graph Matching for Image-Text RetrievalCode1
Graph Optimal Transport for Cross-Domain AlignmentCode1
Learning Relation Alignment for Calibrated Cross-modal RetrievalCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional UnderstandingCode1
Contrastive Audio-Language Learning for MusicCode1
Learning to Rank in Generative RetrievalCode1
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial TrajectoryCode1
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
Learnable Pillar-based Re-ranking for Image-Text RetrievalCode1
Equivariant Similarity for Vision-Language Foundation ModelsCode1
Learning a Text-Video Embedding from Incomplete and Heterogeneous DataCode1
Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak DecoderCode1
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse RetrievalCode1
Language-agnostic BERT Sentence EmbeddingCode1
Knowledge Guided Text Retrieval and Reading for Open Domain Question AnsweringCode1
A Data-Centric Framework for Composable NLP WorkflowsCode1
LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text RetrievalCode1
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingCode1
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision ModelsCode1
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentCode1
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-trainingCode1
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware SamplingCode1
Cross-modal Scene Graph Matching for Relationship-aware Image-Text RetrievalCode1
Kaleido-BERT: Vision-Language Pre-training on Fashion DomainCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding EvaluationCode1
Cross-Modal Retrieval with Partially Mismatched PairsCode1
Cross-Modal Retrieval for Motion and Text via DopTriple LossCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
Show:102550
← PrevPage 3 of 14Next →

No leaderboard results yet.