SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 150 of 671 papers

TitleStatusHype
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language ModelsCode7
h2oGPT: Democratizing Large Language ModelsCode6
BM25S: Orders of magnitude faster lexical search via eager sparse scoringCode5
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive RetrievalCode5
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationCode5
FG-CLIP: Fine-Grained Visual and Textual AlignmentCode4
Multi-label Cluster Discrimination for Visual Representation LearningCode4
MTEB: Massive Text Embedding BenchmarkCode4
Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text RetrieversCode4
RETSim: Resilient and Efficient Text SimilarityCode4
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic AlignmentCode4
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality TeachersCode4
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal UnderstandingCode3
AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content CreationCode3
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation ModelsCode3
Vision-Language Pre-training: Basics, Recent Advances, and Future TrendsCode3
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language ModelsCode3
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote SensingCode2
ProtT3: Protein-to-Text Generation for Text-based Protein UnderstandingCode2
RWKV-CLIP: A Robust Vision-Language Representation LearnerCode2
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image AnalysisCode2
MedCLIP: Contrastive Learning from Unpaired Medical Images and TextCode2
Multi-modal Molecule Structure-text Model for Text-based Retrieval and EditingCode2
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text SupervisionCode2
Gramian Multimodal Representation Learning and AlignmentCode2
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical DocumentsCode2
RemoteCLIP: A Vision Language Foundation Model for Remote SensingCode2
Accelerating Transformers with Spectrum-Preserving Token MergingCode2
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific LiteratureCode2
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow InstructionsCode2
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval ModelsCode2
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation AlignmentCode2
VeCLIP: Improving CLIP Training via Visual-enriched CaptionsCode2
DreamLIP: Language-Image Pre-training with Long CaptionsCode2
Distillation Enhanced Generative RetrievalCode2
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse RepresentationsCode2
Dense Text Retrieval based on Pretrained Language Models: A SurveyCode2
Efficient Remote Sensing with Harmonized Transfer Learning and Modality AlignmentCode2
A Replication Study of Dense Passage RetrieverCode2
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model EvaluationCode2
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality InversionCode2
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial ApplicationsCode2
Frozen Transformers in Language Models Are Effective Visual Encoder LayersCode2
GLAP: General contrastive audio-text pretraining across domains and languagesCode2
Cross-lingual and Multilingual CLIPCode2
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language ModelsCode2
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text RetrievalCode2
Egocentric Video-Language PretrainingCode2
Show:102550
← PrevPage 1 of 14Next →

No leaderboard results yet.