SOTAVerified

Retrieval

A methodology that involves selecting relevant data or examples from a large dataset to support tasks like prediction, learning, or inference. It enhances models by providing context or additional information, often used in systems like retrieval-augmented generation or in-context learning.

Papers

Showing 15511600 of 14297 papers

TitleStatusHype
Doc2Query--: When Less is MoreCode1
Why do Nearest Neighbor Language Models Work?Code1
You Truly Understand What I Need: Intellectual and Friendly Dialogue Agents grounding Knowledge and PersonaCode1
Learning Semantic Relationship Among Instances for Image-Text MatchingCode1
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode1
Progressive Spatio-Temporal Prototype Matching for Text-Video RetrievalCode1
Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video RetrievalCode1
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse RetrievalCode1
Divide&Classify: Fine-Grained Classification for City-Wide Visual Geo-LocalizationCode1
Unsupervised Feature Representation Learning for Domain-generalized Cross-domain Image RetrievalCode1
Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning NetworkCode1
R2Former: Unified Retrieval and Reranking Transformer for Place RecognitionCode1
Towards Modality-Agnostic Person Re-Identification With Descriptive QueryCode1
RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal RetrievalCode1
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active LearningCode1
Modeling Video As Stochastic Processes for Fine-Grained Video Representation LearningCode1
Revisiting Self-Similarity: Structural Embedding for Image RetrievalCode1
Rethinking with Retrieval: Faithful Large Language Model InferenceCode1
HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D ImagesCode1
TempCLR: Temporal Alignment Representation with Contrastive LearningCode1
MVTN: Learning Multi-View Transformations for 3D UnderstandingCode1
Noise-aware Learning from Web-crawled Image-Text Data for Image CaptioningCode1
Multi-queue Momentum Contrast for Microvideo-Product RetrievalCode1
Multi-hop Evidence Retrieval for Cross-document Relation ExtractionCode1
Parallel Context Windows for Large Language ModelsCode1
Data Curation Alone Can Stabilize In-context LearningCode1
When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric MemoriesCode1
SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic MistakesCode1
Query-as-context Pre-training for Dense Passage RetrievalCode1
Position-guided Text Prompt for Vision-Language Pre-trainingCode1
Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language ModelCode1
Attentive Mask CLIPCode1
Self-Prompting Large Language Models for Zero-Shot Open-Domain QACode1
Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-GenerationCode1
MAViL: Masked Audio-Video LearnersCode1
Unsupervised Object Localization: Observing the Background to Discover ObjectsCode1
FlexiViT: One Model for All Patch SizesCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual QueriesCode1
Reproducible scaling laws for contrastive language-image learningCode1
LidarCLIP or: How I Learned to Talk to Point CloudsCode1
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code1
In Defense of Cross-Encoders for Zero-Shot RetrievalCode1
VindLU: A Recipe for Effective Video-and-Language PretrainingCode1
Vision and Structured-Language Pretraining for Cross-Modal Food RetrievalCode1
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetCode1
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance GenerationCode1
A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal RetrievalCode1
Neural Machine Translation with Contrastive Translation MemoriesCode1
Hierarchical Contrast for Unsupervised Skeleton-based Action Representation LearningCode1
Show:102550
← PrevPage 32 of 286Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1BM25SQueries per second183.53Unverified
2ElasticsearchQueries per second21.8Unverified
3BM25-PTQueries per second6.49Unverified
4Rank-BM25Queries per second1.18Unverified
#ModelMetricClaimedVerifiedStatus
1BM25SQueries per second20.88Unverified
2ElasticsearchQueries per second7.11Unverified
3Rank-BM25Queries per second0.04Unverified
#ModelMetricClaimedVerifiedStatus
1BM25SQueries per second41.85Unverified
2ElasticsearchQueries per second12.16Unverified
3Rank-BM25Queries per second0.1Unverified
#ModelMetricClaimedVerifiedStatus
1FLMRRecall@589.32Unverified
2RA-VQARecall@582.84Unverified
#ModelMetricClaimedVerifiedStatus
1PreFLMRRecall@562.1Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP-KIStext-to-video Mean Rank30Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4OutfitRecall@57.59Unverified
#ModelMetricClaimedVerifiedStatus
1MetaGen Blended RAGAccuracy (Top-1)82.1Unverified
#ModelMetricClaimedVerifiedStatus
1MetaGen Blended RAGAccuracy (Top-1)82.1Unverified
#ModelMetricClaimedVerifiedStatus
1COLTCOMP@84.55Unverified
#ModelMetricClaimedVerifiedStatus
1hello0L1,121,222Unverified