SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 201250 of 671 papers

TitleStatusHype
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation0
MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction0
FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge0
TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model0
Learning with Noisy Correspondence0
HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models0
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement0
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language ModelsCode3
Shallow Cross-Encoders for Low-Latency RetrievalCode0
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
Denoising Table-Text Retrieval for Open-Domain Question AnsweringCode0
DreamLIP: Language-Image Pre-training with Long CaptionsCode2
Improving Retrieval for RAG based Question Answering Models on Financial Documents0
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow InstructionsCode2
vid-TLDR: Training Free Token merging for Light-weight Video TransformerCode2
Eye-gaze Guided Multi-modal Alignment for Medical Representation LearningCode1
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial TrajectoryCode1
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival0
Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction0
Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval0
Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval0
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text RetrievalCode2
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?0
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality TeachersCode4
Multimodal Learned Sparse Retrieval with Probabilistic Expansion ControlCode1
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval0
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings0
MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning0
PIRB: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods0
LongAgent: Scaling Language Models to 128k Context through Multi-Agent CollaborationCode1
Distillation Enhanced Generative RetrievalCode2
Multimodal Learned Sparse Retrieval for Image Suggestion0
Video Editing for Video Retrieval0
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text RetrievalCode2
Embracing Language Inclusivity and Diversity in CLIP through Continual Language LearningCode0
Towards 3D Molecule-Text Interpretation in Language ModelsCode2
Enhancing Image-Text Matching with Adaptive Feature AggregationCode0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment0
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving0
Accept the Modality Gap: An Exploration in the Hyperbolic Space0
OTE: Exploring Accurate Scene Text Recognition Using One TokenCode0
Building Vision-Language Models on Solid Foundations with Masked Distillation0
Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence RegularizationCode1
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic TasksCode1
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain RetrievalCode1
Data-Efficient Multimodal Fusion on a Single GPUCode1
Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data0
RGNet: A Unified Clip Retrieval and Grounding Network for Long VideosCode1
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
Predictive Chemistry Augmented with Text RetrievalCode1
Show:102550
← PrevPage 5 of 14Next →

No leaderboard results yet.