SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 51100 of 671 papers

TitleStatusHype
MedCLIP: Contrastive Learning from Unpaired Medical Images and TextCode2
Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image AnalysisCode2
Multi-modal Molecule Structure-text Model for Text-based Retrieval and EditingCode2
Gramian Multimodal Representation Learning and AlignmentCode2
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text RetrievalCode2
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language ModelsCode2
GLAP: General contrastive audio-text pretraining across domains and languagesCode2
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text RetrievalCode2
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document ReasoningCode2
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow InstructionsCode2
Audio Retrieval with WavText5K and CLAP TrainingCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
Align before Fuse: Vision and Language Representation Learning with Momentum DistillationCode1
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and ReportsCode1
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal MappingCode1
Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents IntegrationCode1
FlexiViT: One Model for All Patch SizesCode1
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkCode1
A Survey of Medical Vision-and-Language Applications and Their TechniquesCode1
COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust LearningCode1
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text RetrievalCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning NetworkCode1
Fine-grained Video-Text Retrieval with Hierarchical Graph ReasoningCode1
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
Fine-Tuning LLaMA for Multi-Stage Text RetrievalCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge GraphsCode1
Fast and Light-Weight Answer Text Retrieval in Dialogue SystemsCode1
Extending Multi-modal Contrastive RepresentationsCode1
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive LearningCode1
Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneCode1
Eye-gaze Guided Multi-modal Alignment for Medical Representation LearningCode1
FETA: Towards Specializing Foundation Models for Expert Task ApplicationsCode1
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text RetrievalCode1
A Comprehensive Review of the Video-to-Text ProblemCode1
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision ModelsCode1
Bridging Language Gaps in Audio-Text RetrievalCode1
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware SamplingCode1
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingCode1
A Prior Instruction Representation Framework for Remote Sensing Image-text RetrievalCode1
A Dense Representation Framework for Lexical and Semantic MatchingCode1
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency BenefitsCode1
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information RetrievalCode1
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetCode1
Show:102550
← PrevPage 2 of 14Next →

No leaderboard results yet.