SOTAVerified

Image Classification

Image Classification is a fundamental task in vision recognition that aims to understand and categorize an image as a whole under a specific label. Unlike object detection, which involves classification and location of multiple objects within an image, image classification typically pertains to single-object images. When the classification becomes highly detailed or reaches instance-level, it is often referred to as image retrieval, which also involves finding similar images in a large database.

Source: Metamorphic Testing for Object Detection Systems

Papers

Showing 601650 of 10419 papers

TitleStatusHype
MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited MemoryCode1
Interpretable Medical Image Classification using Prototype Learning and Privileged InformationCode1
Vision-Language Pseudo-Labels for Single-Positive Multi-Label LearningCode1
Multi‑camera trajectory matching based on hierarchical clustering and constraintsCode1
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and GenerationCode1
Image Clustering with External GuidanceCode1
Real-Fake: Effective Training Data Synthesis Through Distribution MatchingCode1
RefConv: Re-parameterized Refocusing Convolution for Powerful ConvNetsCode1
PaLI-3 Vision Language Models: Smaller, Faster, StrongerCode1
Leveraging Vision-Language Models for Improving Domain Generalization in Image ClassificationCode1
AutoVP: An Automated Visual Prompting Framework and BenchmarkCode1
EViT: An Eagle Vision Transformer with Bi-Fovea Self-AttentionCode1
Efficient Adaptation of Large Vision Transformer via Adapter Re-ComposingCode1
Transformer Fusion with Optimal TransportCode1
Progressive Neural Compression for Adaptive Image Offloading under Timing ConstraintsCode1
TiC: Exploring Vision Transformer in ConvolutionCode1
Why Do We Need Weight Decay in Modern Deep Learning?Code1
SemiReward: A General Reward Model for Semi-supervised LearningCode1
ViT-ReciproCAM: Gradient and Attention-Free Visual Explanations for Vision TransformerCode1
Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentationCode1
Extending CAM-based XAI methods for Remote Sensing Imagery SegmentationCode1
A Framework for Inference Inspired by Human Memory MechanismsCode1
Enhancing Sharpness-Aware Optimization Through Variance SuppressionCode1
ClusterFormer: Clustering As A Universal Visual LearnerCode1
MUSTANG: Multi-Stain Self-Attention Graph Multiple Instance Learning Pipeline for Histopathology Whole Slide ImagesCode1
NoisyNN: Exploring the Impact of Information Entropy Change in Learning SystemsCode1
Long-Tail Learning with Foundation Model: Heavy Fine-Tuning HurtsCode1
Interpretability-Aware Vision TransformerCode1
Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?Code1
Language Models as Black-Box Optimizers for Vision-Language ModelsCode1
SparseSwin: Swin Transformer with Sparse Transformer BlockCode1
Share Your Representation Only: Guaranteed Improvement of the Privacy-Utility Tradeoff in Federated LearningCode1
Class-Incremental Grouping Network for Continual Audio-Visual LearningCode1
Divergences in Color Perception between Deep Neural Networks and HumansCode1
DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data AugmentationCode1
When to Learn What: Model-Adaptive Data Augmentation CurriculumCode1
Locality-Aware Hyperspectral ClassificationCode1
Traveling Waves Encode the Recent Past and Enhance Sequence LearningCode1
Fine-grained Recognition with Learnable Semantic Data AugmentationCode1
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot InteractionCode1
A Dual-Direction Attention Mixed Feature Network for Facial Expression RecognitionCode1
FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated LearningCode1
Masking Strategies for Background Bias Removal in Computer Vision ModelsCode1
DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic CalibrationCode1
Integrated Image and Location Analysis for Wound Classification: A Deep Learning ApproachCode1
Image-free Classifier Injection for Zero-Shot ClassificationCode1
Diffusion Model as Representation LearnerCode1
Unlocking Accuracy and Fairness in Differentially Private Image ClassificationCode1
A Comprehensive Empirical Evaluation on Online Continual LearningCode1
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision TransformersCode1
Show:102550
← PrevPage 13 of 209Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CoCa (finetuned)Top 1 Accuracy91Unverified
2Model soups (BASIC-L)Top 1 Accuracy90.98Unverified
3Model soups (ViT-G/14)Top 1 Accuracy90.94Unverified
4DaViT-GTop 1 Accuracy90.4Unverified
5DaViT-HTop 1 Accuracy90.2Unverified
6Meta Pseudo Labels (EfficientNet-L2)Top 1 Accuracy90.2Unverified
7SwinV2-GTop 1 Accuracy90.17Unverified
8MAWS (ViT-6.5B)Top 1 Accuracy90.1Unverified
9Florence-CoSwin-HTop 1 Accuracy90.05Unverified
10Meta Pseudo Labels (EfficientNet-B6-Wide)Top 1 Accuracy90Unverified