SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 151200 of 671 papers

TitleStatusHype
MV-Adapter: Multimodal Video Transfer Learning for Video Text RetrievalCode1
Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning NetworkCode1
Learning Semantic Relationship Among Instances for Image-Text MatchingCode1
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse RetrievalCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
FlexiViT: One Model for All Patch SizesCode1
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetCode1
ComCLIP: Training-Free Compositional Image and Text MatchingCode1
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion LearningCode1
COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust LearningCode1
VTC: Improving Video-Text Retrieval with User CommentsCode1
Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQACode1
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training ModelCode1
Nonparametric Decoding for Generative RetrievalCode1
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language ModelCode1
DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge BasesCode1
Audio Retrieval with WavText5K and CLAP TrainingCode1
Mr. Right: Multimodal Retrieval on Representation of ImaGe witH TextCode1
FETA: Towards Specializing Foundation Models for Expert Task ApplicationsCode1
Universal Vision-Language Dense Retrieval: Learning A Unified Representation Space for Multi-Modal RetrievalCode1
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentCode1
Contrastive Audio-Language Learning for MusicCode1
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text RetrievalCode1
A Dense Representation Framework for Lexical and Semantic MatchingCode1
MixGen: A New Multi-Modal Data AugmentationCode1
Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneCode1
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-trainingCode1
Fast and Light-Weight Answer Text Retrieval in Dialogue SystemsCode1
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer RerankingCode1
CCMB: A Large-scale Chinese Cross-modal BenchmarkCode1
Cross-modal Contrastive Learning for Speech TranslationCode1
Generative Multi-hop RetrievalCode1
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalCode1
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENerationCode1
On Metric Learning for Audio-Text Cross-Modal RetrievalCode1
LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text RetrievalCode1
Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text RetrievalCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
CLIP-Lite: Information Efficient Visual Representation Learning with Language SupervisionCode1
Densifying Sparse Representations for Passage Retrieval by Representational SlicingCode1
Video-Text Pre-training with Learned RegionsCode1
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsCode1
Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak DecoderCode1
Dense Hierarchical Retrieval for Open-Domain Question AnsweringCode1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
HANet: Hierarchical Alignment Networks for Video-Text RetrievalCode1
Align before Fuse: Vision and Language Representation Learning with Momentum DistillationCode1
Show:102550
← PrevPage 4 of 14Next →

No leaderboard results yet.