SOTAVerified

Image Retrieval

Image Retrieval is a fundamental and long-standing computer vision task that involves finding images similar to a given query from a large database. It is often considered a form of fine-grained, instance-level classification. The task is integral to image recognition alongside classification and cross-modal retrieval. By leveraging visual similarity and other criteria, image retrieval enables users to efficiently discover relevant images, making it a crucial tool in applications such as search and recommendation.

Extending CLIP for Category-to-image Retrieval in E-commerce

( Image credit: DELF )

Papers

Showing 201250 of 2239 papers

TitleStatusHype
Garment Attribute Manipulation with Multi-level Attention0
Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image RetrievalCode4
A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions0
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension GuidingCode0
Open-World Dynamic Prompt and Continual Visual Representation Learning0
Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and SimilarityCode0
Zero-Shot Whole Slide Image Retrieval in Histopathology Using Embeddings of Foundation Models0
Design and Evaluation of Camera-Centric Mobile Crowdsourcing Applications0
NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for RetrievalCode1
Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding AlignmentCode0
Evidential Transformers for Improved Image Retrieval0
A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches0
Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models0
Temporal Attention for Cross-View Sequential Image LocalizationCode0
Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild0
LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval TaskCode0
Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations0
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and GenerationCode1
Fashion Image-to-Image Translation for Complementary Item Retrieval0
BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval0
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval0
Coarse-to-fine Alignment Makes Better Speech-image Retrieval0
DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions0
Cross-view image geo-localization with Panorama-BEV Co-Retrieval NetworkCode2
AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level RetrievalCode1
On Validation of Search & Retrieval of Tissue Images in Digital Pathology0
Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation0
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack BenchmarkCode1
EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis0
Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval0
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen RepresentationsCode1
An experimental evaluation of Siamese Neural Networks for robot localization using omnidirectional imaging in indoor environments0
Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image RetrievalCode0
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image RetrievalCode2
Lifelong Histopathology Whole Slide Image Retrieval via Distance Consistency RehearsalCode0
Multi-Group Proportional Representation in RetrievalCode0
CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding0
HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels0
Pseudo-triplet Guided Few-shot Composed Image Retrieval0
Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt LearningCode0
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language ModelsCode0
Learning from Memory: Non-Parametric Memory Augmented Self-Supervised Learning of Visual FeaturesCode0
Celeb-FBI: A Benchmark Dataset on Human Full Body Images and Age, Gender, Height and Weight Estimation using Deep Learning Approach0
Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval0
Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval0
PathAlign: A vision-language model for whole slide images in histopathology0
Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs0
WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million imagesCode0
Breaking the Frame: Visual Place Recognition by Overlap PredictionCode1
CLIP-Branches: Interactive Fine-Tuning for Text-Image RetrievalCode0
Show:102550
← PrevPage 5 of 45Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SuperGlobalmAP80.2Unverified
2AMESmAP80Unverified
3Hypergraph propagation+community selectionmAP73Unverified
4TokenmAP66.57Unverified
5DELG+ α QE reranking+ RRT rerankingmAP64Unverified
6FIRemAP61.2Unverified
7HOWmAP56.9Unverified
8ResNet101+ArcFace GLDv2-train-cleanmAP51.6Unverified
9DELF–HQE+SPmAP50.3Unverified
10HesAff–rSIFT–HQE+SPmAP49.7Unverified
#ModelMetricClaimedVerifiedStatus
1AMESmAP90.7Unverified
2Hypergraph propagation+Community selectionmAP88.4Unverified
3TokenmAP82.28Unverified
4FIRemAP81.8Unverified
5DELG+ α QE reranking + RRT rerankingmAP80.4Unverified
6HOWmAP79.4Unverified
7ResNet101+ArcFace GLDv2-train-cleanmAP74.2Unverified
8DELF–HQE+SPmAP73.4Unverified
9HesAff–rSIFT–HQE+SPmAP71.3Unverified
10DELF–ASMK*+SPmAP67.8Unverified
#ModelMetricClaimedVerifiedStatus
1AMESmAP89.7Unverified
2SuperGlobalmAP86.7Unverified
3Hypergraph propagationmAP83.3Unverified
4TokenmAP78.56Unverified
5DELG+ α QE reranking + RRT rerankingmAP77.7Unverified
6ResNet101+ArcFace GLDv2-train-cleanmAP70.3Unverified
7FIRemAP70Unverified
8DELF–HQE+SPmAP69.3Unverified
9HOWmAP62.4Unverified
10R–R-MACmAP59.4Unverified
#ModelMetricClaimedVerifiedStatus
1AMESmAP94.9Unverified
2Hypergraph propagationmAP92.6Unverified
3TokenmAP89.34Unverified
4DELG+ α QE reranking + RRT rerankingmAP88.5Unverified
5FIRemAP85.3Unverified
6ResNet101+ArcFace GLDv2-train-cleanmAP84.9Unverified
7DELF–HQE+SPmAP84Unverified
8HOWmAP81.6Unverified
9R–R-MACmAP78.9Unverified
10R–GeMmAP77.2Unverified
#ModelMetricClaimedVerifiedStatus
1Swin-T (MosaiCLIP, CC-12M)Recall@1 (HN-Atom, UC)44.5Unverified
2RN-50 (MosaiCLIP, CC-12M)Recall@1 (HN-Atom, UC)44.4Unverified
3MosaiCLIP (YFCC-FT)Recall@1 (HN-Atom, UC)41.5Unverified
4RN-50 (NegCLIP, CC-12M)Recall@1 (HN-Atom, UC)41.4Unverified
5MosaiCLIP (CC-FT)Recall@1 (HN-Atom, UC)40.9Unverified
6Swin-T (NegCLIP, CC-12M)Recall@1 (HN-Atom, UC)39.6Unverified
7CLIP (YFCC-FT)Recall@1 (HN-Atom, UC)39.5Unverified
8ViT-L-14 (LAION400M)Recall@1 (HN-Atom + HN-Comp, SC)39.44Unverified
9NegCLIP (YFCC-FT)Recall@1 (HN-Atom, UC)39Unverified
10CLIP-FT (YFCC-FT)Recall@1 (HN-Atom, UC)38.3Unverified
#ModelMetricClaimedVerifiedStatus
1DQU-CIR(Recall@10+Recall@50)/271.77Unverified
2TMCIR(Recall@10+Recall@50)/266.56Unverified
3SPN4CIR (SPRC)(Recall@10+Recall@50)/266.41Unverified
4SPRC(Recall@10+Recall@50)/264.85Unverified
5Candidate Set Re-ranking(Recall@10+Recall@50)/262.15Unverified
6RUTIR (BLIP B/16)(Recall@10+Recall@50)/261.32Unverified
7CASE(Recall@10+Recall@50)/259.73Unverified
8CaLa(Recall@10+Recall@50)/257.96Unverified
9BLIP4CIR+Bi(Recall@10+Recall@50)/255.4Unverified
10CLIP4Cir (v3)(Recall@10+Recall@50)/255.36Unverified
#ModelMetricClaimedVerifiedStatus
1X-VLM (base)R@186.9Unverified
2RCARR@162.6Unverified
3SGRAFR@158.5Unverified
4LGSGMR@157.4Unverified
5VisualSpartaR@157.4Unverified
6TERAN MrSwR@156.5Unverified
7TERAN Symm.R@155.7Unverified
8VSRNR@154.7Unverified
9CAMPR@151.5Unverified
10SCAN i-tR@144Unverified
#ModelMetricClaimedVerifiedStatus
1TMCIR(Recall@5+Recall_subset@1)/283.46Unverified
2SPN4CIR (SPRC)(Recall@5+Recall_subset@1)/282.69Unverified
3SPRC2(Recall@5+Recall_subset@1)/282.66Unverified
4SPRC(Recall@5+Recall_subset@1)/281.39Unverified
5Candidate Set Re-ranking(Recall@5+Recall_subset@1)/280.9Unverified
6CaLa(Recall@5+Recall_subset@1)/278.74Unverified
7CASE (Pre-trained on LaSCo.Ca)(Recall@5+Recall_subset@1)/278.25Unverified
8CASE(Recall@5+Recall_subset@1)/277.5Unverified
9VISTA (base)(Recall@5+Recall_subset@1)/275.9Unverified
10MMRet-MLLM(Recall@5+Recall_subset@1)/275.7Unverified
#ModelMetricClaimedVerifiedStatus
1Unicom+ViT-L@336pxR@191.2Unverified
2ROADMAP (DeiT-B)R@186Unverified
3CGD (SG/GS)R@184.2Unverified
4ROADMAP (ResNet-50)R@183.1Unverified
5ProxyNCA++R@181.4Unverified
6PNP LossR@181.1Unverified
7Cross-Batch MemoryR@180.6Unverified
8Smooth-APR@180.1Unverified
9NormSoftmax2048 (ResNet-50)R@179.5Unverified
10EPSHN512R@178.3Unverified
#ModelMetricClaimedVerifiedStatus
1InternVL-G-FTR@185.9Unverified
2InternVL-C-FTR@185.2Unverified
3CN-CLIP (ViT-L/14@336px)R@184.4Unverified
4R2D2 (ViT-L/14)R@184.4Unverified
5CN-CLIP (ViT-H/14)R@183.8Unverified
6CN-CLIP (ViT-L/14)R@182.7Unverified
7CN-CLIP (ViT-B/16)R@179.1Unverified
8R2D2 (ViT-B)R@178.3Unverified
9Wukong (ViT-L/14)R@177.4Unverified
10Wukong (ViT-B/32)R@167.6Unverified
#ModelMetricClaimedVerifiedStatus
1Offline DiffusionMAP96.2Unverified
2CNN+IME layerMAP92Unverified
3DELF+FT+ATT+DIR+QEMAP90Unverified
4DIR+QE*MAP89Unverified