SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 151200 of 671 papers

TitleStatusHype
Data-Efficient Multimodal Fusion on a Single GPUCode1
Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning NetworkCode1
Fine-grained Video-Text Retrieval with Hierarchical Graph ReasoningCode1
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video RetrievalCode1
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding EvaluationCode1
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
Fine-Tuning LLaMA for Multi-Stage Text RetrievalCode1
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-trainingCode1
Cross-modal Scene Graph Matching for Relationship-aware Image-Text RetrievalCode1
FuseCap: Leveraging Large Language Models for Enriched Fused Image CaptionsCode1
Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New BenchmarkCode1
Cross-Modal Retrieval with Partially Mismatched PairsCode1
Cross-Modal Retrieval for Motion and Text via DopTriple LossCode1
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetCode1
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial TrajectoryCode1
Extending Multi-modal Contrastive RepresentationsCode1
Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency BenefitsCode1
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image RecognitionCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
Graph Optimal Transport for Cross-Domain AlignmentCode1
Eye-gaze Guided Multi-modal Alignment for Medical Representation LearningCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Vision-Language Dataset DistillationCode1
Multimodal Federated Learning via Contrastive Representation EnsembleCode1
Cross-modal Contrastive Learning for Speech TranslationCode1
Bridging Language Gaps in Audio-Text RetrievalCode1
HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer RerankingCode1
Image-text Retrieval via Preserving Main Semantics of VisionCode1
ESA: External Space Attention Aggregation for Image-Text RetrievalCode1
A Comprehensive Review of the Video-to-Text ProblemCode1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
Fast and Light-Weight Answer Text Retrieval in Dialogue SystemsCode1
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language RepresentationsCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic TasksCode1
A Prior Instruction Representation Framework for Remote Sensing Image-text RetrievalCode1
Kaleido-BERT: Vision-Language Pre-training on Fashion DomainCode1
Multimodal Learned Sparse Retrieval with Probabilistic Expansion ControlCode1
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware SamplingCode1
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language TransformersCode1
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive LearningCode1
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingCode1
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentCode1
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
Learnable Pillar-based Re-ranking for Image-Text RetrievalCode1
Learning a Text-Video Embedding from Incomplete and Heterogeneous DataCode1
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information RetrievalCode1
CoSMo: Content-Style Modulation for Image Retrieval With Text FeedbackCode1
Show:102550
← PrevPage 4 of 14Next →

No leaderboard results yet.