SOTAVerified

Image Classification

Image Classification is a fundamental task in vision recognition that aims to understand and categorize an image as a whole under a specific label. Unlike object detection, which involves classification and location of multiple objects within an image, image classification typically pertains to single-object images. When the classification becomes highly detailed or reaches instance-level, it is often referred to as image retrieval, which also involves finding similar images in a large database.

Source: Metamorphic Testing for Object Detection Systems

Papers

Showing 24012450 of 10419 papers

TitleStatusHype
Boosting High Resolution Image Classification with Scaling-up TransformersCode0
Applications of Sequential Learning for Medical Image Classification0
Multi-Label Feature Selection Using Adaptive and Transformed RelevanceCode0
ZiCo-BC: A Bias Corrected Zero-Shot NAS for Vision Tasks0
Efficient Post-training Quantization with FP8 FormatsCode4
Noise-Tolerant Few-Shot Unsupervised Adapter for Vision-Language Models0
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss0
Masked Image Residual Learning for Scaling Deeper Vision TransformersCode0
Single Image Test-Time Adaptation for Segmentation0
PARTICLE: Part Discovery and Contrastive Learning for Fine-grained RecognitionCode0
Convolutional autoencoder-based multimodal one-class classification0
Combining Two Adversarial Attacks Against Person Re-Identification Systems0
DFRD: Data-Free Robustness Distillation for Heterogeneous Federated Learning0
A Unified Scheme of ResNet and Softmax0
Multi-modal Domain Adaptation for REG via Relation Transfer0
ClusterFormer: Clustering As A Universal Visual LearnerCode1
Understanding Calibration of Deep Neural Networks for Medical Image Classification0
Bridging Sensor Gaps via Attention Gated Tuning for Hyperspectral Image ClassificationCode0
Associative TransformerCode0
DualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token Fusion0
H²O: Heatmap by Hierarchical OcclusionCode0
COSE: A Consistency-Sensitivity Metric for Saliency on Image ClassificationCode0
Causality-Driven One-Shot Learning for Prostate Cancer Grading from MRI0
MUSTANG: Multi-Stain Self-Attention Graph Multiple Instance Learning Pipeline for Histopathology Whole Slide ImagesCode1
NoisyNN: Exploring the Impact of Information Entropy Change in Learning SystemsCode1
Investigating the Catastrophic Forgetting in Multimodal Large Language Models0
Long-Tail Learning with Foundation Model: Heavy Fine-Tuning HurtsCode1
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling0
The role of causality in explainable artificial intelligence0
MVP: Meta Visual Prompt Tuning for Few-Shot Remote Sensing Image Scene Classification0
Personalized Food Image Classification: Benchmark Datasets and New Baseline0
Biased Attention: Do Vision Transformers Amplify Gender Bias More than Convolutional Neural Networks?Code0
Learning by Self-ExplainingCode0
Continual Learning with Deep Streaming Regularized Discriminant AnalysisCode0
Interpretability-Aware Vision TransformerCode1
Mitigating Group Bias in Federated Learning for Heterogeneous Devices0
Dynamic Spectrum Mixer for Visual Recognition0
Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense0
Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?Code1
Contrastive Deep Encoding Enables Uncertainty-aware Machine-learning-assisted Histopathology0
LCReg: Long-Tailed Image Classification with Latent Categories based Recognition0
OLID I: an open leaf image dataset for plant stress recognition0
Language Models as Black-Box Optimizers for Vision-Language ModelsCode1
Padding-free Convolution based on Preservation of Differential Characteristics of Kernels0
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity0
Strong-Weak Integrated Semi-supervision for Unsupervised Single and Multi Target Domain Adaptation0
Computer Vision Pipeline for Automated Antarctic Krill Analysis0
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification0
Divergences in Color Perception between Deep Neural Networks and HumansCode1
SparseSwin: Swin Transformer with Sparse Transformer BlockCode1
Show:102550
← PrevPage 49 of 209Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CoCa (finetuned)Top 1 Accuracy91Unverified
2Model soups (BASIC-L)Top 1 Accuracy90.98Unverified
3Model soups (ViT-G/14)Top 1 Accuracy90.94Unverified
4DaViT-GTop 1 Accuracy90.4Unverified
5Meta Pseudo Labels (EfficientNet-L2)Top 1 Accuracy90.2Unverified
6DaViT-HTop 1 Accuracy90.2Unverified
7SwinV2-GTop 1 Accuracy90.17Unverified
8MAWS (ViT-6.5B)Top 1 Accuracy90.1Unverified
9Florence-CoSwin-HTop 1 Accuracy90.05Unverified
10RevCol-HTop 1 Accuracy90Unverified