SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 201250 of 671 papers

TitleStatusHype
More Robust Dense Retrieval with Contrastive Dual LearningCode1
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
CLIP2Video: Mastering Video-Text Retrieval via Image CLIPCode1
CoSMo: Content-Style Modulation for Image Retrieval With Text FeedbackCode1
A Deep Local and Global Scene-Graph Matching for Image-Text RetrievalCode1
Learning Relation Alignment for Calibrated Cross-modal RetrievalCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
Condenser: a Pre-training Architecture for Dense RetrievalCode1
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware SamplingCode1
Understanding Hard Negatives in Noise Contrastive EstimationCode1
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation LearningCode1
Scene Text Retrieval via Joint Text Detection and Similarity LearningCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
Kaleido-BERT: Vision-Language Pre-training on Fashion DomainCode1
A Comprehensive Review of the Video-to-Text ProblemCode1
VLGrammar: Grounded Grammar Induction of Vision and LanguageCode1
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text RetrievalCode1
A Data-Centric Framework for Composable NLP WorkflowsCode1
Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency BenefitsCode1
Rethink Training of BERT Rerankers in Multi-Stage Retrieval PipelineCode1
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image RecognitionCode1
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and ReportsCode1
Consensus-Aware Visual-Semantic Embedding for Image-Text MatchingCode1
Language-agnostic BERT Sentence EmbeddingCode1
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text RetrievalCode1
Graph Optimal Transport for Cross-Domain AlignmentCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal TransformersCode1
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text RetrievalCode1
Fine-grained Video-Text Retrieval with Hierarchical Graph ReasoningCode1
Knowledge Guided Text Retrieval and Reading for Open Domain Question AnsweringCode1
Cross-modal Scene Graph Matching for Relationship-aware Image-Text RetrievalCode1
UNITER: UNiversal Image-TExt Representation LearningCode1
XQA: A Cross-lingual Open-domain Question Answering DatasetCode1
Polysemous Visual-Semantic Embedding for Cross-Modal RetrievalCode1
Learning a Text-Video Embedding from Incomplete and Heterogeneous DataCode1
Stacked Cross Attention for Image-Text MatchingCode1
Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval0
Tree-Based Text Retrieval via Hierarchical Clustering in RAGFrameworks: Application on Taiwanese RegulationsCode0
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention RecyclingCode0
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration0
Adding simple structure at inference improves Vision-Language CompositionalityCode0
Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts0
Attacking Attention of Foundation Models Disrupts Downstream TasksCode0
ERU-KG: Efficient Reference-aligned Unsupervised Keyphrase GenerationCode0
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval0
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation0
EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models0
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval0
Show:102550
← PrevPage 5 of 14Next →

No leaderboard results yet.