SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 51100 of 671 papers

TitleStatusHype
MedCLIP: Contrastive Learning from Unpaired Medical Images and TextCode2
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation AlignmentCode2
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEsCode2
Egocentric Video-Language PretrainingCode2
Cross-lingual and Multilingual CLIPCode2
Vision-Language Pre-Training with Triple Contrastive LearningCode2
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval ModelsCode2
A Replication Study of Dense Passage RetrieverCode2
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine LearningCode2
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text SupervisionCode2
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision ModelsCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge GraphsCode1
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language ModelsCode1
GOAL: Global-local Object Alignment LearningCode1
PeerQA: A Scientific Question Answering Dataset from Peer ReviewsCode1
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based SearchCode1
I0T: Embedding Standardization Method Towards Zero Modality GapCode1
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information RetrievalCode1
A Survey of Medical Vision-and-Language Applications and Their TechniquesCode1
Nearest Neighbor Normalization Improves Multimodal RetrievalCode1
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video RetrievalCode1
ReCLAP: Improving Zero Shot Audio Classification by Describing SoundsCode1
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkCode1
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text RetrievalCode1
Learning Video Context as Interleaved Multimodal SequencesCode1
Video-Language Alignment via Spatio-Temporal Graph TransformerCode1
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding EvaluationCode1
SignCLIP: Connecting Text and Sign Language by Contrastive LearningCode1
Composing Object Relations and Attributes for Image-Text MatchingCode1
Bridging Language Gaps in Audio-Text RetrievalCode1
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text RetrievalCode1
LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent SpaceCode1
Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents IntegrationCode1
PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation LearningCode1
Revisiting Deep Audio-Text Retrieval Through the Lens of TransportationCode1
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
Eye-gaze Guided Multi-modal Alignment for Medical Representation LearningCode1
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial TrajectoryCode1
Multimodal Learned Sparse Retrieval with Probabilistic Expansion ControlCode1
LongAgent: Scaling Language Models to 128k Context through Multi-Agent CollaborationCode1
Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence RegularizationCode1
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic TasksCode1
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain RetrievalCode1
Data-Efficient Multimodal Fusion on a Single GPUCode1
RGNet: A Unified Clip Retrieval and Grounding Network for Long VideosCode1
Predictive Chemistry Augmented with Text RetrievalCode1
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language UnderstandingCode1
MLLMs-Augmented Visual-Language Representation LearningCode1
Show:102550
← PrevPage 2 of 14Next →

No leaderboard results yet.