SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 176200 of 671 papers

TitleStatusHype
Cross-modal Contrastive Learning for Speech TranslationCode1
Bridging Language Gaps in Audio-Text RetrievalCode1
HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer RerankingCode1
Image-text Retrieval via Preserving Main Semantics of VisionCode1
ESA: External Space Attention Aggregation for Image-Text RetrievalCode1
A Comprehensive Review of the Video-to-Text ProblemCode1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
Fast and Light-Weight Answer Text Retrieval in Dialogue SystemsCode1
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language RepresentationsCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic TasksCode1
A Prior Instruction Representation Framework for Remote Sensing Image-text RetrievalCode1
Kaleido-BERT: Vision-Language Pre-training on Fashion DomainCode1
Multimodal Learned Sparse Retrieval with Probabilistic Expansion ControlCode1
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware SamplingCode1
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language TransformersCode1
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive LearningCode1
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingCode1
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentCode1
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
Learnable Pillar-based Re-ranking for Image-Text RetrievalCode1
Learning a Text-Video Embedding from Incomplete and Heterogeneous DataCode1
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information RetrievalCode1
CoSMo: Content-Style Modulation for Image Retrieval With Text FeedbackCode1
Show:102550
← PrevPage 8 of 27Next →

No leaderboard results yet.