SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 2650 of 671 papers

TitleStatusHype
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text SupervisionCode2
Gramian Multimodal Representation Learning and AlignmentCode2
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical DocumentsCode2
RemoteCLIP: A Vision Language Foundation Model for Remote SensingCode2
Accelerating Transformers with Spectrum-Preserving Token MergingCode2
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific LiteratureCode2
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow InstructionsCode2
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval ModelsCode2
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation AlignmentCode2
VeCLIP: Improving CLIP Training via Visual-enriched CaptionsCode2
DreamLIP: Language-Image Pre-training with Long CaptionsCode2
Distillation Enhanced Generative RetrievalCode2
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse RepresentationsCode2
Dense Text Retrieval based on Pretrained Language Models: A SurveyCode2
Efficient Remote Sensing with Harmonized Transfer Learning and Modality AlignmentCode2
A Replication Study of Dense Passage RetrieverCode2
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model EvaluationCode2
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality InversionCode2
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial ApplicationsCode2
Frozen Transformers in Language Models Are Effective Visual Encoder LayersCode2
GLAP: General contrastive audio-text pretraining across domains and languagesCode2
Cross-lingual and Multilingual CLIPCode2
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language ModelsCode2
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text RetrievalCode2
Egocentric Video-Language PretrainingCode2
Show:102550
← PrevPage 2 of 27Next →

No leaderboard results yet.