SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 5175 of 671 papers

TitleStatusHype
Frozen Transformers in Language Models Are Effective Visual Encoder LayersCode2
Where am I? Cross-View Geo-localization with Natural Language DescriptionsCode2
Cross-lingual and Multilingual CLIPCode2
Efficient Remote Sensing with Harmonized Transfer Learning and Modality AlignmentCode2
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation AlignmentCode2
Egocentric Video-Language PretrainingCode2
GLAP: General contrastive audio-text pretraining across domains and languagesCode2
Multi-modal Molecule Structure-text Model for Text-based Retrieval and EditingCode2
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open WorldCode2
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse RepresentationsCode2
Audio Retrieval with WavText5K and CLAP TrainingCode1
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
Align before Fuse: Vision and Language Representation Learning with Momentum DistillationCode1
Cross-modal Scene Graph Matching for Relationship-aware Image-Text RetrievalCode1
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and ReportsCode1
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-trainingCode1
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkCode1
A Survey of Medical Vision-and-Language Applications and Their TechniquesCode1
Cross-Modal Retrieval with Partially Mismatched PairsCode1
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware SamplingCode1
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language TransformersCode1
Cross-modal Contrastive Learning for Speech TranslationCode1
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding EvaluationCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
Show:102550
← PrevPage 3 of 27Next →

No leaderboard results yet.