SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 101150 of 671 papers

TitleStatusHype
GLEN: Generative Retrieval via Lexical Index LearningCode1
A Prior Instruction Representation Framework for Remote Sensing Image-text RetrievalCode1
MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module PluginCode1
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal AdapterCode1
Extending Multi-modal Contrastive RepresentationsCode1
Fine-Tuning LLaMA for Multi-Stage Text RetrievalCode1
ESA: External Space Attention Aggregation for Image-Text RetrievalCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal RetrievalCode1
Unified Coarse-to-Fine Alignment for Video-Text RetrievalCode1
LinkTransformer: A Unified Package for Record Linkage with Transformer Language ModelsCode1
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and MemoryCode1
Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained AlignmentCode1
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text RetrievalCode1
Multi-event Video-Text RetrievalCode1
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Vision-Language Dataset DistillationCode1
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive LearningCode1
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training ModelsCode1
PRIOR: Prototype Representation Joint Learning from Medical Images and ReportsCode1
mCLIP: Multilingual CLIP via Cross-lingual TransferCode1
Learning to Rank in Generative RetrievalCode1
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional UnderstandingCode1
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingCode1
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language RepresentationsCode1
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New BenchmarkCode1
Revisiting the Role of Language Priors in Vision-Language ModelsCode1
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language ModelsCode1
FuseCap: Leveraging Large Language Models for Enriched Fused Image CaptionsCode1
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language TransformersCode1
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist CaptionsCode1
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision TransformersCode1
Cross-Modal Retrieval for Motion and Text via DopTriple LossCode1
Understanding Differential Search Index for Text RetrievalCode1
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal MappingCode1
Learnable Pillar-based Re-ranking for Image-Text RetrievalCode1
Rethinking Benchmarks for Cross-modal Image-text RetrievalCode1
Image-text Retrieval via Preserving Main Semantics of VisionCode1
SViTT: Temporal Learning of Sparse Video-Text TransformersCode1
Hyperbolic Image-Text RepresentationsCode1
Equivariant Similarity for Vision-Language Foundation ModelsCode1
Cross-Modal Retrieval with Partially Mismatched PairsCode1
Multimodal Federated Learning via Contrastive Representation EnsembleCode1
UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal ModelingCode1
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text RetrievalCode1
UPop: Unified and Progressive Pruning for Compressing Vision-Language TransformersCode1
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge TransferringCode1
Show:102550
← PrevPage 3 of 14Next →

No leaderboard results yet.