SOTAVerified

Image Classification

Image Classification is a fundamental task in vision recognition that aims to understand and categorize an image as a whole under a specific label. Unlike object detection, which involves classification and location of multiple objects within an image, image classification typically pertains to single-object images. When the classification becomes highly detailed or reaches instance-level, it is often referred to as image retrieval, which also involves finding similar images in a large database.

Source: Metamorphic Testing for Object Detection Systems

Papers

Showing 12511300 of 10419 papers

TitleStatusHype
Augmenting Convolutional networks with attention-based aggregationCode1
ELSA: Enhanced Local Self-Attention for Vision TransformerCode1
RepMLPNet: Hierarchical Vision MLP with Re-parameterized LocalityCode1
Learned Queries for Efficient Local AttentionCode1
Transformers Can Do Bayesian InferenceCode1
HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical ImagesCode1
UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality BarrierCode1
Towards End-to-End Image Compression and Analysis with TransformersCode1
Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image RecognitionCode1
An Empirical Investigation of the Role of Pre-training in Lifelong LearningCode1
RegionCLIP: Region-based Language-Image PretrainingCode1
Pure Noise to the Rescue of Insufficient Data: Improving Imbalanced Classification by Training on Random Noise ImagesCode1
Learning to Prompt for Continual LearningCode1
Towards General and Efficient Active LearningCode1
Heuristic Hyperparameter Optimization for Convolutional Neural Networks using Genetic AlgorithmCode1
AdaViT: Adaptive Tokens for Efficient Vision TransformerCode1
WOOD: Wasserstein-based Out-of-Distribution DetectionCode1
Simple and Robust Loss Design for Multi-Label Learning with Missing LabelsCode1
Boosting Active Learning via Improving Test PerformanceCode1
Visual Transformers with Primal Object Queries for Multi-Label Image ClassificationCode1
The Large Labelled Logo Dataset (L3D): A Multipurpose and Hand-Labelled Continuously Growing DatasetCode1
Locally Shifted Attention With Early Global IntegrationCode1
PE-former: Pose Estimation TransformerCode1
Obtaining Calibrated Probabilities with Personalized Ranking ModelsCode1
A Contrastive Distillation Approach for Incremental Semantic Segmentation in Aerial ImagesCode1
Dilated convolution with learnable spacingsCode1
Interpretable Image Classification with Differentiable Prototypes AssignmentCode1
Scaling Up Influence FunctionsCode1
Hard Sample Aware Noise Robust Learning for Histopathology Image ClassificationCode1
Novel Class Discovery in Semantic SegmentationCode1
A Survey: Deep Learning for Hyperspectral Image Classification with Few Labeled SamplesCode1
A Fast Knowledge Distillation Framework for Visual RecognitionCode1
FIBA: Frequency-Injection based Backdoor Attack in Medical Image AnalysisCode1
Sample Prior Guided Robust Model Learning to Suppress Noisy LabelsCode1
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionCode1
Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?Code1
Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation LearningCode1
Pooling by Sliced-Wasserstein EmbeddingCode1
The Majority Can Help The Minority: Context-rich Minority Oversampling for Long-tailed ClassificationCode1
Focal Attention for Long-Range Interactions in Vision TransformersCode1
Semi-supervised music emotion recognition using noisy student training and harmonic pitch class profilesCode1
The Devil is in the Margin: Margin-based Label Smoothing for Network CalibrationCode1
Sound-Guided Semantic Image ManipulationCode1
MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at ScaleCode1
Adaptive Token Sampling For Efficient Vision TransformersCode1
Learning with Noisy Labels by Efficient Transition Matrix Estimation to Combat Label MiscorrectionCode1
Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable PrototypesCode1
ExCon: Explanation-driven Supervised Contrastive Learning for Image ClassificationCode1
TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNsCode1
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual RecognitionCode1
Show:102550
← PrevPage 26 of 209Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CoCa (finetuned)Top 1 Accuracy91Unverified
2Model soups (BASIC-L)Top 1 Accuracy90.98Unverified
3Model soups (ViT-G/14)Top 1 Accuracy90.94Unverified
4DaViT-GTop 1 Accuracy90.4Unverified
5Meta Pseudo Labels (EfficientNet-L2)Top 1 Accuracy90.2Unverified
6DaViT-HTop 1 Accuracy90.2Unverified
7SwinV2-GTop 1 Accuracy90.17Unverified
8MAWS (ViT-6.5B)Top 1 Accuracy90.1Unverified
9Florence-CoSwin-HTop 1 Accuracy90.05Unverified
10RevCol-HTop 1 Accuracy90Unverified