SOTAVerified

Image Retrieval

Image Retrieval is a fundamental and long-standing computer vision task that involves finding images similar to a given query from a large database. It is often considered a form of fine-grained, instance-level classification. The task is integral to image recognition alongside classification and cross-modal retrieval. By leveraging visual similarity and other criteria, image retrieval enables users to efficiently discover relevant images, making it a crucial tool in applications such as search and recommendation.

Extending CLIP for Category-to-image Retrieval in E-commerce

( Image credit: DELF )

Papers

Showing 150 of 2239 papers

TitleStatusHype
Visual Instruction TuningCode6
DINOv2: Learning Robust Visual Features without SupervisionCode6
GPT-4 Technical ReportCode6
Chinese CLIP: Contrastive Vision-Language Pretraining in ChineseCode5
CogVLM: Visual Expert for Pretrained Language ModelsCode5
Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image RetrievalCode4
Long-CLIP: Unlocking the Long-Text Capability of CLIPCode4
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsCode4
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and VideoCode4
AltCLIP: Altering the Language Encoder in CLIP for Extended Language CapabilitiesCode4
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like SpeedCode4
MegaPairs: Massive Data Synthesis For Universal Multimodal RetrievalCode3
MagicLens: Self-Supervised Image Retrieval with Open-Ended InstructionsCode3
A Comprehensive Survey on Composed Image RetrievalCode3
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextCode3
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of DataCode2
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal AlignmentCode2
MixVPR: Feature Mixing for Visual Place RecognitionCode2
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image RetrievalCode2
INQUIRE: A Natural World Text-to-Image Retrieval BenchmarkCode2
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image RetrievalCode2
Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic GraspingCode2
Global Features are All You Need for Image Retrieval and RerankingCode2
Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmarkCode2
Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur PriorCode2
Language-only Training of Zero-shot Composed Image RetrievalCode2
Fine-grained Image Captioning with CLIP RewardCode2
Local Feature Matching Using Deep Learning: A SurveyCode2
FastReID: A Pytorch Toolbox for General Instance Re-identificationCode2
Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image RetrievalCode2
EMR-Merging: Tuning-Free High-Performance Model MergingCode2
Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a BenchmarkCode2
Grounding Language Models to Images for Multimodal Inputs and OutputsCode2
D3still: Decoupled Differential Distillation for Asymmetric Image RetrievalCode2
All You Need to Know About Training Image Retrieval ModelsCode2
Cross-view image geo-localization with Panorama-BEV Co-Retrieval NetworkCode2
EarthLoc: Astronaut Photography Localization by Indexing Earth from SpaceCode2
Composed Image Retrieval for Remote SensingCode2
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial ApplicationsCode2
Efficient Remote Sensing with Harmonized Transfer Learning and Modality AlignmentCode2
Encrypted Vector Similarity Computations Using Partially Homomorphic Encryption: Applications and Performance AnalysisCode2
Enhancing Diagnostic Accuracy in Rare and Common Fundus Diseases with a Knowledge-Rich Vision-Language ModelCode2
Composed Multi-modal Retrieval: A Survey of Approaches and ApplicationsCode2
CLEAR: A Fully User-side Image Search SystemCode2
Generating Images with Multimodal Language ModelsCode2
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localizationCode2
AnyLoc: Towards Universal Visual Place RecognitionCode2
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality InversionCode2
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningCode2
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image DescriptionsCode2
Show:102550
← PrevPage 1 of 45Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SuperGlobalmAP80.2Unverified
2AMESmAP80Unverified
3Hypergraph propagation+community selectionmAP73Unverified
4TokenmAP66.57Unverified
5DELG+ α QE reranking+ RRT rerankingmAP64Unverified
6FIRemAP61.2Unverified
7HOWmAP56.9Unverified
8ResNet101+ArcFace GLDv2-train-cleanmAP51.6Unverified
9DELF–HQE+SPmAP50.3Unverified
10HesAff–rSIFT–HQE+SPmAP49.7Unverified
#ModelMetricClaimedVerifiedStatus
1AMESmAP90.7Unverified
2Hypergraph propagation+Community selectionmAP88.4Unverified
3TokenmAP82.28Unverified
4FIRemAP81.8Unverified
5DELG+ α QE reranking + RRT rerankingmAP80.4Unverified
6HOWmAP79.4Unverified
7ResNet101+ArcFace GLDv2-train-cleanmAP74.2Unverified
8DELF–HQE+SPmAP73.4Unverified
9HesAff–rSIFT–HQE+SPmAP71.3Unverified
10DELF–ASMK*+SPmAP67.8Unverified
#ModelMetricClaimedVerifiedStatus
1AMESmAP89.7Unverified
2SuperGlobalmAP86.7Unverified
3Hypergraph propagationmAP83.3Unverified
4TokenmAP78.56Unverified
5DELG+ α QE reranking + RRT rerankingmAP77.7Unverified
6ResNet101+ArcFace GLDv2-train-cleanmAP70.3Unverified
7FIRemAP70Unverified
8DELF–HQE+SPmAP69.3Unverified
9HOWmAP62.4Unverified
10R–R-MACmAP59.4Unverified
#ModelMetricClaimedVerifiedStatus
1AMESmAP94.9Unverified
2Hypergraph propagationmAP92.6Unverified
3TokenmAP89.34Unverified
4DELG+ α QE reranking + RRT rerankingmAP88.5Unverified
5FIRemAP85.3Unverified
6ResNet101+ArcFace GLDv2-train-cleanmAP84.9Unverified
7DELF–HQE+SPmAP84Unverified
8HOWmAP81.6Unverified
9R–R-MACmAP78.9Unverified
10R–GeMmAP77.2Unverified
#ModelMetricClaimedVerifiedStatus
1Swin-T (MosaiCLIP, CC-12M)Recall@1 (HN-Atom, UC)44.5Unverified
2RN-50 (MosaiCLIP, CC-12M)Recall@1 (HN-Atom, UC)44.4Unverified
3MosaiCLIP (YFCC-FT)Recall@1 (HN-Atom, UC)41.5Unverified
4RN-50 (NegCLIP, CC-12M)Recall@1 (HN-Atom, UC)41.4Unverified
5MosaiCLIP (CC-FT)Recall@1 (HN-Atom, UC)40.9Unverified
6Swin-T (NegCLIP, CC-12M)Recall@1 (HN-Atom, UC)39.6Unverified
7CLIP (YFCC-FT)Recall@1 (HN-Atom, UC)39.5Unverified
8ViT-L-14 (LAION400M)Recall@1 (HN-Atom + HN-Comp, SC)39.44Unverified
9NegCLIP (YFCC-FT)Recall@1 (HN-Atom, UC)39Unverified
10CLIP-FT (YFCC-FT)Recall@1 (HN-Atom, UC)38.3Unverified
#ModelMetricClaimedVerifiedStatus
1DQU-CIR(Recall@10+Recall@50)/271.77Unverified
2TMCIR(Recall@10+Recall@50)/266.56Unverified
3SPN4CIR (SPRC)(Recall@10+Recall@50)/266.41Unverified
4SPRC(Recall@10+Recall@50)/264.85Unverified
5Candidate Set Re-ranking(Recall@10+Recall@50)/262.15Unverified
6RUTIR (BLIP B/16)(Recall@10+Recall@50)/261.32Unverified
7CASE(Recall@10+Recall@50)/259.73Unverified
8CaLa(Recall@10+Recall@50)/257.96Unverified
9BLIP4CIR+Bi(Recall@10+Recall@50)/255.4Unverified
10CLIP4Cir (v3)(Recall@10+Recall@50)/255.36Unverified
#ModelMetricClaimedVerifiedStatus
1X-VLM (base)R@186.9Unverified
2RCARR@162.6Unverified
3SGRAFR@158.5Unverified
4VisualSpartaR@157.4Unverified
5LGSGMR@157.4Unverified
6TERAN MrSwR@156.5Unverified
7TERAN Symm.R@155.7Unverified
8VSRNR@154.7Unverified
9CAMPR@151.5Unverified
10SCAN i-tR@144Unverified
#ModelMetricClaimedVerifiedStatus
1TMCIR(Recall@5+Recall_subset@1)/283.46Unverified
2SPN4CIR (SPRC)(Recall@5+Recall_subset@1)/282.69Unverified
3SPRC2(Recall@5+Recall_subset@1)/282.66Unverified
4SPRC(Recall@5+Recall_subset@1)/281.39Unverified
5Candidate Set Re-ranking(Recall@5+Recall_subset@1)/280.9Unverified
6CaLa(Recall@5+Recall_subset@1)/278.74Unverified
7CASE (Pre-trained on LaSCo.Ca)(Recall@5+Recall_subset@1)/278.25Unverified
8CASE(Recall@5+Recall_subset@1)/277.5Unverified
9VISTA (base)(Recall@5+Recall_subset@1)/275.9Unverified
10MMRet-MLLM(Recall@5+Recall_subset@1)/275.7Unverified
#ModelMetricClaimedVerifiedStatus
1Unicom+ViT-L@336pxR@191.2Unverified
2ROADMAP (DeiT-B)R@186Unverified
3CGD (SG/GS)R@184.2Unverified
4ROADMAP (ResNet-50)R@183.1Unverified
5ProxyNCA++R@181.4Unverified
6PNP LossR@181.1Unverified
7Cross-Batch MemoryR@180.6Unverified
8Smooth-APR@180.1Unverified
9NormSoftmax2048 (ResNet-50)R@179.5Unverified
10EPSHN512R@178.3Unverified
#ModelMetricClaimedVerifiedStatus
1InternVL-G-FTR@185.9Unverified
2InternVL-C-FTR@185.2Unverified
3R2D2 (ViT-L/14)R@184.4Unverified
4CN-CLIP (ViT-L/14@336px)R@184.4Unverified
5CN-CLIP (ViT-H/14)R@183.8Unverified
6CN-CLIP (ViT-L/14)R@182.7Unverified
7CN-CLIP (ViT-B/16)R@179.1Unverified
8R2D2 (ViT-B)R@178.3Unverified
9Wukong (ViT-L/14)R@177.4Unverified
10Wukong (ViT-B/32)R@167.6Unverified
#ModelMetricClaimedVerifiedStatus
1Offline DiffusionMAP96.2Unverified
2CNN+IME layerMAP92Unverified
3DELF+FT+ATT+DIR+QEMAP90Unverified
4DIR+QE*MAP89Unverified