SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 301350 of 671 papers

TitleStatusHype
Multi-event Video-Text RetrievalCode1
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Vision-Language Dataset DistillationCode1
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive LearningCode1
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks0
Embedding-based Retrieval with LLM for Effective Agriculture Information Extracting from Unstructured Data0
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open WorldCode2
Defense of Adversarial Ranking Attack in Text Retrieval: Benchmark and Baseline via Detection0
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training ModelsCode1
PRIOR: Prototype Representation Joint Learning from Medical Images and ReportsCode1
Towards a Visual-Language Foundation Model for Computational Pathology0
Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning0
Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP0
mCLIP: Multilingual CLIP via Cross-lingual TransferCode1
Stop Pre-Training: Adapt Visual-Language Models to Unseen LanguagesCode0
Learning to Rank in Generative RetrievalCode1
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input0
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible AdapterCode0
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote SensingCode2
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in IndonesianCode0
Align, Adapt and Inject: Sound-guided Unified Image Generation0
RemoteCLIP: A Vision Language Foundation Model for Remote SensingCode2
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional UnderstandingCode1
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingCode1
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language RepresentationsCode1
h2oGPT: Democratizing Large Language ModelsCode6
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New BenchmarkCode1
Revisiting the Role of Language Priors in Vision-Language ModelsCode1
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language ModelsCode1
FuseCap: Leveraging Large Language Models for Enriched Fused Image CaptionsCode1
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language TransformersCode1
Integrating Listwise Ranking into Pairwise-based Image-Text RetrievalCode0
Enhancing the Ranking Context of Dense Retrieval Methods through Reciprocal Nearest NeighborsCode0
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional ExpertsCode0
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist CaptionsCode1
When the Music Stops: Tip-of-the-Tongue Retrieval for MusicCode0
i-Code Studio: A Configurable and Composable Framework for Integrative AI0
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
TOME: A Two-stage Approach for Model-based Retrieval0
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval0
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision TransformersCode1
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception0
Cross-Modal Retrieval for Motion and Text via DopTriple LossCode1
Understanding Differential Search Index for Text RetrievalCode1
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal MappingCode1
Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining0
Show:102550
← PrevPage 7 of 14Next →

No leaderboard results yet.