SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 5175 of 671 papers

TitleStatusHype
VeCLIP: Improving CLIP Training via Visual-enriched CaptionsCode2
Cross-lingual and Multilingual CLIPCode2
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model EvaluationCode2
Frozen Transformers in Language Models Are Effective Visual Encoder LayersCode2
Efficient Remote Sensing with Harmonized Transfer Learning and Modality AlignmentCode2
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language ModelsCode2
Dense Text Retrieval based on Pretrained Language Models: A SurveyCode2
Egocentric Video-Language PretrainingCode2
GLAP: General contrastive audio-text pretraining across domains and languagesCode2
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open WorldCode2
Audio Retrieval with WavText5K and CLAP TrainingCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
Align before Fuse: Vision and Language Representation Learning with Momentum DistillationCode1
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-trainingCode1
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and ReportsCode1
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding EvaluationCode1
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkCode1
A Survey of Medical Vision-and-Language Applications and Their TechniquesCode1
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware SamplingCode1
Cross-modal Contrastive Learning for Speech TranslationCode1
Cross-Modal Retrieval for Motion and Text via DopTriple LossCode1
Cross-Modal Retrieval with Partially Mismatched PairsCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge GraphsCode1
Show:102550
← PrevPage 3 of 27Next →

No leaderboard results yet.