SOTAVerified

Image Classification

Image Classification is a fundamental task in vision recognition that aims to understand and categorize an image as a whole under a specific label. Unlike object detection, which involves classification and location of multiple objects within an image, image classification typically pertains to single-object images. When the classification becomes highly detailed or reaches instance-level, it is often referred to as image retrieval, which also involves finding similar images in a large database.

Source: Metamorphic Testing for Object Detection Systems

Papers

Showing 16011650 of 10419 papers

TitleStatusHype
Causal Transportability for Visual RecognitionCode1
AutoAssist: A Framework to Accelerate Training of Deep Neural NetworksCode1
Deep CNNs Meet Global Covariance Pooling: Better Representation and GeneralizationCode1
Meta-Adapter: An Online Few-shot Learner for Vision-Language ModelCode1
Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image ClassificationCode1
MetaAudio: A Few-Shot Audio Classification BenchmarkCode1
Deep CORAL: Correlation Alignment for Deep Domain AdaptationCode1
Deep Fried ConvnetsCode1
Meta-Learning with Fewer Tasks through Task InterpolationCode1
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model GeneralizationCode1
Deep Reinforcement Learning for Band Selection in Hyperspectral Image ClassificationCode1
M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry OptimizationCode1
Dense Contrastive Learning for Self-Supervised Visual Pre-TrainingCode1
Disentangled Ontology Embedding for Zero-shot LearningCode1
Decision Stream: Cultivating Deep Decision TreesCode1
A New Semi-supervised Learning Benchmark for Classifying View and Diagnosing Aortic Stenosis from EchocardiogramsCode1
DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic AugmentationCode1
Fcaformer: Forward Cross Attention in Hybrid Vision TransformerCode1
Cached Transformers: Improving Transformers with Differentiable Memory CacheCode1
Hybrid Supervision Learning for Pathology Whole Slide Image ClassificationCode1
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision TransformersCode1
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep SubnetworksCode1
Mixture of Gaussian-distributed Prototypes with Generative Modelling for Interpretable and Trustworthy Image RecognitionCode1
mixup: Beyond Empirical Risk MinimizationCode1
ML-Decoder: Scalable and Versatile Classification HeadCode1
Complementary-Label Learning for Arbitrary Losses and ModelsCode1
Can An Image Classifier Suffice For Action Recognition?Code1
MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene UnderstandingCode1
Decoupled Dynamic Filter NetworksCode1
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing AttentionCode1
A General Regret Bound of Preconditioned Gradient Method for DNN TrainingCode1
Deblurring Masked Autoencoder is Better Recipe for Ultrasound Image RecognitionCode1
Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural NetworkCode1
CamDiff: Camouflage Image Augmentation via Diffusion ModelCode1
DCN-T: Dual Context Network with Transformer for Hyperspectral Image ClassificationCode1
CAMIL: Context-Aware Multiple Instance Learning for Cancer Detection and Subtyping in Whole Slide ImagesCode1
DC3DCD: unsupervised learning for multiclass 3D point cloud change detectionCode1
DCT-CryptoNets: Scaling Private Inference in the Frequency DomainCode1
DEAL: Deep Evidential Active Learning for Image ClassificationCode1
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksCode1
Decoupled Weight Decay RegularizationCode1
Modeling Uncertain Feature Representation for Domain GeneralizationCode1
Data-Efficient Deep Learning Method for Image Classification Using Data Augmentation, Focal Cosine Loss, and EnsembleCode1
Data Feedback Loops: Model-driven Amplification of Dataset BiasesCode1
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)Code1
Can Language Understand Depth?Code1
A Dual-Direction Attention Mixed Feature Network for Facial Expression RecognitionCode1
Compressing Features for Learning with Noisy LabelsCode1
MoPro: Webly Supervised Learning with Momentum PrototypesCode1
Data Augmentation with norm-VAE for Unsupervised Domain AdaptationCode1
Show:102550
← PrevPage 33 of 209Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CoCa (finetuned)Top 1 Accuracy91Unverified
2Model soups (BASIC-L)Top 1 Accuracy90.98Unverified
3Model soups (ViT-G/14)Top 1 Accuracy90.94Unverified
4DaViT-GTop 1 Accuracy90.4Unverified
5Meta Pseudo Labels (EfficientNet-L2)Top 1 Accuracy90.2Unverified
6DaViT-HTop 1 Accuracy90.2Unverified
7SwinV2-GTop 1 Accuracy90.17Unverified
8MAWS (ViT-6.5B)Top 1 Accuracy90.1Unverified
9Florence-CoSwin-HTop 1 Accuracy90.05Unverified
10RevCol-HTop 1 Accuracy90Unverified