SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 101150 of 671 papers

TitleStatusHype
CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval0
BoolQuestions: Does Dense Retrieval Understand Boolean Logic in Language?0
Partial Scene Text RetrievalCode0
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs0
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities0
Nearest Neighbor Normalization Improves Multimodal RetrievalCode1
Multilingual Vision-Language Pre-training for the Remote Sensing DomainCode0
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization0
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial ApplicationsCode2
Do Audio-Language Models Understand Linguistic Variations?0
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric LearningCode0
Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging0
Beyond Coarse-Grained Matching in Video-Text Retrieval0
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning0
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video RetrievalCode1
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning0
AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models0
CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation0
From Unimodal to Multimodal: Scaling up Projectors to Align ModalitiesCode0
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization0
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval0
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training0
ReCLAP: Improving Zero Shot Audio Classification by Describing SoundsCode1
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG0
Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations0
Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E50
MODOC: A Modular Interface for Flexible Interlinking of Text Retrieval and Text Generation FunctionsCode0
Mistral-SPLADE: LLMs for better Learned Sparse RetrievalCode0
Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores0
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality0
Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval0
Pairing Clustered Inverted Indexes with kNN Graphs for Fast Approximate Retrieval over Learned Sparse Representations0
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkCode1
Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation0
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text RetrievalCode1
Learning Video Context as Interleaved Multimodal SequencesCode1
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language ModelsCode0
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval0
FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysisCode0
Multi-label Cluster Discrimination for Visual Representation LearningCode4
Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective0
Multimodal Misinformation Detection using Large Vision-Language Models0
Object-Aware Query Perturbation for Cross-Modal Image-Text RetrievalCode0
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive RetrievalCode5
Video-Language Alignment via Spatio-Temporal Graph TransformerCode1
EA-VTR: Event-Aware Video-Text Retrieval0
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?0
Towards a text-based quantitative and explainable histopathology image analysisCode0
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging0
CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding0
Show:102550
← PrevPage 3 of 14Next →

No leaderboard results yet.