SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 101125 of 671 papers

TitleStatusHype
CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval0
BoolQuestions: Does Dense Retrieval Understand Boolean Logic in Language?0
Partial Scene Text RetrievalCode0
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs0
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities0
Nearest Neighbor Normalization Improves Multimodal RetrievalCode1
Multilingual Vision-Language Pre-training for the Remote Sensing DomainCode0
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization0
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial ApplicationsCode2
Do Audio-Language Models Understand Linguistic Variations?0
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric LearningCode0
Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging0
Beyond Coarse-Grained Matching in Video-Text Retrieval0
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning0
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning0
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video RetrievalCode1
AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models0
CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation0
From Unimodal to Multimodal: Scaling up Projectors to Align ModalitiesCode0
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization0
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval0
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training0
ReCLAP: Improving Zero Shot Audio Classification by Describing SoundsCode1
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG0
Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations0
Show:102550
← PrevPage 5 of 27Next →

No leaderboard results yet.