SOTAVerified

Image Retrieval

Image Retrieval is a fundamental and long-standing computer vision task that involves finding images similar to a given query from a large database. It is often considered a form of fine-grained, instance-level classification. The task is integral to image recognition alongside classification and cross-modal retrieval. By leveraging visual similarity and other criteria, image retrieval enables users to efficiently discover relevant images, making it a crucial tool in applications such as search and recommendation.

Extending CLIP for Category-to-image Retrieval in E-commerce

( Image credit: DELF )

Papers

Showing 150 of 2239 papers

TitleStatusHype
MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval0
FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval0
RadiomicsRetrieval: A Customizable Framework for Medical Image Retrieval Using Radiomics FeaturesCode1
Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based ReasoningCode0
MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image RetrievalCode0
Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval0
Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model0
An analysis of vision-language models for fabric retrieval0
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval0
On the Burstiness of Faces in Set0
Referring Expression Instance Retrieval and A Strong End-to-End Baseline0
Class Agnostic Instance-level Descriptor for Visual Instance Search0
Fine-grained Image Retrieval via Dual-Vision Adaptation0
Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval0
A Semantically-Aware Relevance Measure for Content-Based Medical Image Retrieval Evaluation0
Improving Personalized Search with Regularized Low-Rank Parameter UpdatesCode0
Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints0
Hidden Bias in the Machine: Stereotypes in Text-to-Image Models0
Quantization-based Bounds on the Wasserstein Metric0
SORCE: Small Object Retrieval in Complex EnvironmentsCode0
Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch0
Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule0
ConText-CIR: Learning from Concepts in Text for Composed Image RetrievalCode1
Can Visual Encoder Learn to See Arrows?0
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval0
Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval0
Visualized Text-to-Image RetrievalCode1
One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIPCode1
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP0
DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval0
DART^3: Leveraging Distance for Test Time Adaptation in Person Re-Identification0
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval0
SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph RetrievalCode0
IA-T2I: Internet-Augmented Text-to-Image Generation0
Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models0
Non-planar Object Detection and Identification by Features Matching and Triangulation Growth0
Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization0
Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing0
Seeing the Abstract: Translating the Abstract Language for Vision Language ModelsCode0
OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates DiscoveryCode0
Geolocating Earth Imagery from ISS: Integrating Machine Learning with Astronaut Photography for Enhanced Geographic MappingCode0
From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval0
CLIPSE -- a minimalistic CLIP-based image search engine for researchCode0
A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs0
Generalized Visual Relation Detection with Diffusion Models0
TMCIR: Token Merge Benefits Composed Image Retrieval0
Visual Re-Ranking with Non-Visual Side InformationCode0
Focus on Local: Finding Reliable Discriminative Regions for Visual Place RecognitionCode1
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations0
Show:102550
← PrevPage 1 of 45Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SuperGlobalmAP80.2Unverified
2AMESmAP80Unverified
3Hypergraph propagation+community selectionmAP73Unverified
4TokenmAP66.57Unverified
5DELG+ α QE reranking+ RRT rerankingmAP64Unverified
6FIRemAP61.2Unverified
7HOWmAP56.9Unverified
8ResNet101+ArcFace GLDv2-train-cleanmAP51.6Unverified
9DELF–HQE+SPmAP50.3Unverified
10HesAff–rSIFT–HQE+SPmAP49.7Unverified
#ModelMetricClaimedVerifiedStatus
1AMESmAP90.7Unverified
2Hypergraph propagation+Community selectionmAP88.4Unverified
3TokenmAP82.28Unverified
4FIRemAP81.8Unverified
5DELG+ α QE reranking + RRT rerankingmAP80.4Unverified
6HOWmAP79.4Unverified
7ResNet101+ArcFace GLDv2-train-cleanmAP74.2Unverified
8DELF–HQE+SPmAP73.4Unverified
9HesAff–rSIFT–HQE+SPmAP71.3Unverified
10DELF–ASMK*+SPmAP67.8Unverified
#ModelMetricClaimedVerifiedStatus
1AMESmAP89.7Unverified
2SuperGlobalmAP86.7Unverified
3Hypergraph propagationmAP83.3Unverified
4TokenmAP78.56Unverified
5DELG+ α QE reranking + RRT rerankingmAP77.7Unverified
6ResNet101+ArcFace GLDv2-train-cleanmAP70.3Unverified
7FIRemAP70Unverified
8DELF–HQE+SPmAP69.3Unverified
9HOWmAP62.4Unverified
10R–R-MACmAP59.4Unverified
#ModelMetricClaimedVerifiedStatus
1AMESmAP94.9Unverified
2Hypergraph propagationmAP92.6Unverified
3TokenmAP89.34Unverified
4DELG+ α QE reranking + RRT rerankingmAP88.5Unverified
5FIRemAP85.3Unverified
6ResNet101+ArcFace GLDv2-train-cleanmAP84.9Unverified
7DELF–HQE+SPmAP84Unverified
8HOWmAP81.6Unverified
9R–R-MACmAP78.9Unverified
10R–GeMmAP77.2Unverified
#ModelMetricClaimedVerifiedStatus
1Swin-T (MosaiCLIP, CC-12M)Recall@1 (HN-Atom, UC)44.5Unverified
2RN-50 (MosaiCLIP, CC-12M)Recall@1 (HN-Atom, UC)44.4Unverified
3MosaiCLIP (YFCC-FT)Recall@1 (HN-Atom, UC)41.5Unverified
4RN-50 (NegCLIP, CC-12M)Recall@1 (HN-Atom, UC)41.4Unverified
5MosaiCLIP (CC-FT)Recall@1 (HN-Atom, UC)40.9Unverified
6Swin-T (NegCLIP, CC-12M)Recall@1 (HN-Atom, UC)39.6Unverified
7CLIP (YFCC-FT)Recall@1 (HN-Atom, UC)39.5Unverified
8ViT-L-14 (LAION400M)Recall@1 (HN-Atom + HN-Comp, SC)39.44Unverified
9NegCLIP (YFCC-FT)Recall@1 (HN-Atom, UC)39Unverified
10CLIP-FT (YFCC-FT)Recall@1 (HN-Atom, UC)38.3Unverified
#ModelMetricClaimedVerifiedStatus
1DQU-CIR(Recall@10+Recall@50)/271.77Unverified
2TMCIR(Recall@10+Recall@50)/266.56Unverified
3SPN4CIR (SPRC)(Recall@10+Recall@50)/266.41Unverified
4SPRC(Recall@10+Recall@50)/264.85Unverified
5Candidate Set Re-ranking(Recall@10+Recall@50)/262.15Unverified
6RUTIR (BLIP B/16)(Recall@10+Recall@50)/261.32Unverified
7CASE(Recall@10+Recall@50)/259.73Unverified
8CaLa(Recall@10+Recall@50)/257.96Unverified
9BLIP4CIR+Bi(Recall@10+Recall@50)/255.4Unverified
10CLIP4Cir (v3)(Recall@10+Recall@50)/255.36Unverified
#ModelMetricClaimedVerifiedStatus
1X-VLM (base)R@186.9Unverified
2RCARR@162.6Unverified
3SGRAFR@158.5Unverified
4LGSGMR@157.4Unverified
5VisualSpartaR@157.4Unverified
6TERAN MrSwR@156.5Unverified
7TERAN Symm.R@155.7Unverified
8VSRNR@154.7Unverified
9CAMPR@151.5Unverified
10SCAN i-tR@144Unverified
#ModelMetricClaimedVerifiedStatus
1TMCIR(Recall@5+Recall_subset@1)/283.46Unverified
2SPN4CIR (SPRC)(Recall@5+Recall_subset@1)/282.69Unverified
3SPRC2(Recall@5+Recall_subset@1)/282.66Unverified
4SPRC(Recall@5+Recall_subset@1)/281.39Unverified
5Candidate Set Re-ranking(Recall@5+Recall_subset@1)/280.9Unverified
6CaLa(Recall@5+Recall_subset@1)/278.74Unverified
7CASE (Pre-trained on LaSCo.Ca)(Recall@5+Recall_subset@1)/278.25Unverified
8CASE(Recall@5+Recall_subset@1)/277.5Unverified
9VISTA (base)(Recall@5+Recall_subset@1)/275.9Unverified
10MMRet-MLLM(Recall@5+Recall_subset@1)/275.7Unverified
#ModelMetricClaimedVerifiedStatus
1Unicom+ViT-L@336pxR@191.2Unverified
2ROADMAP (DeiT-B)R@186Unverified
3CGD (SG/GS)R@184.2Unverified
4ROADMAP (ResNet-50)R@183.1Unverified
5ProxyNCA++R@181.4Unverified
6PNP LossR@181.1Unverified
7Cross-Batch MemoryR@180.6Unverified
8Smooth-APR@180.1Unverified
9NormSoftmax2048 (ResNet-50)R@179.5Unverified
10EPSHN512R@178.3Unverified
#ModelMetricClaimedVerifiedStatus
1InternVL-G-FTR@185.9Unverified
2InternVL-C-FTR@185.2Unverified
3CN-CLIP (ViT-L/14@336px)R@184.4Unverified
4R2D2 (ViT-L/14)R@184.4Unverified
5CN-CLIP (ViT-H/14)R@183.8Unverified
6CN-CLIP (ViT-L/14)R@182.7Unverified
7CN-CLIP (ViT-B/16)R@179.1Unverified
8R2D2 (ViT-B)R@178.3Unverified
9Wukong (ViT-L/14)R@177.4Unverified
10Wukong (ViT-B/32)R@167.6Unverified
#ModelMetricClaimedVerifiedStatus
1Offline DiffusionMAP96.2Unverified
2CNN+IME layerMAP92Unverified
3DELF+FT+ATT+DIR+QEMAP90Unverified
4DIR+QE*MAP89Unverified