SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 150 of 671 papers

TitleStatusHype
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language ModelsCode7
h2oGPT: Democratizing Large Language ModelsCode6
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive RetrievalCode5
BM25S: Orders of magnitude faster lexical search via eager sparse scoringCode5
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationCode5
FG-CLIP: Fine-Grained Visual and Textual AlignmentCode4
Multi-label Cluster Discrimination for Visual Representation LearningCode4
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality TeachersCode4
RETSim: Resilient and Efficient Text SimilarityCode4
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic AlignmentCode4
MTEB: Massive Text Embedding BenchmarkCode4
Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text RetrieversCode4
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal UnderstandingCode3
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language ModelsCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content CreationCode3
Vision-Language Pre-training: Basics, Recent Advances, and Future TrendsCode3
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation ModelsCode3
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document ReasoningCode2
GLAP: General contrastive audio-text pretraining across domains and languagesCode2
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model EvaluationCode2
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image AnalysisCode2
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality InversionCode2
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific LiteratureCode2
Where am I? Cross-View Geo-localization with Natural Language DescriptionsCode2
Gramian Multimodal Representation Learning and AlignmentCode2
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language ModelsCode2
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial ApplicationsCode2
Towards Vision-Language Geo-Foundation Model: A SurveyCode2
RWKV-CLIP: A Robust Vision-Language Representation LearnerCode2
Accelerating Transformers with Spectrum-Preserving Token MergingCode2
ProtT3: Protein-to-Text Generation for Text-based Protein UnderstandingCode2
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse RepresentationsCode2
Efficient Remote Sensing with Harmonized Transfer Learning and Modality AlignmentCode2
DreamLIP: Language-Image Pre-training with Long CaptionsCode2
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow InstructionsCode2
vid-TLDR: Training Free Token merging for Light-weight Video TransformerCode2
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text RetrievalCode2
Distillation Enhanced Generative RetrievalCode2
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text RetrievalCode2
Towards 3D Molecule-Text Interpretation in Language ModelsCode2
Frozen Transformers in Language Models Are Effective Visual Encoder LayersCode2
VeCLIP: Improving CLIP Training via Visual-enriched CaptionsCode2
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open WorldCode2
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote SensingCode2
RemoteCLIP: A Vision Language Foundation Model for Remote SensingCode2
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical DocumentsCode2
Multi-modal Molecule Structure-text Model for Text-based Retrieval and EditingCode2
Dense Text Retrieval based on Pretrained Language Models: A SurveyCode2
Show:102550
← PrevPage 1 of 14Next →

No leaderboard results yet.