SOTAVerified

Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Showing 151200 of 2042 papers

TitleStatusHype
Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning modelsCode1
Empirical Upper Bound, Error Diagnosis and Invariance Analysis of Modern Object DetectorsCode1
Equalization Loss for Long-Tailed Object RecognitionCode1
EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge DistillationCode1
EventRPG: Event Data Augmentation with Relevance Propagation GuidanceCode1
Evolving Deep Neural NetworksCode1
Expanding Event Modality Applications through a Robust CLIP-Based EncoderCode1
Explainability-Aware One Point Attack for Point Cloud Neural NetworksCode1
Compact Generalized Non-local NetworkCode1
Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance SegmentationCode1
FSD: Fast Self-Supervised Single RGB-D to Categorical 3D ObjectsCode1
F-SIOL-310: A Robotic Dataset and Benchmark for Few-Shot Incremental Object LearningCode1
Generalizable Data-free Objective for Crafting Universal Adversarial PerturbationsCode1
Learning what and where to attendCode1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarkingCode1
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object DetectionCode1
CLIP-guided Federated Learning on Heterogeneous and Long-Tailed DataCode1
Causal Transportability for Visual RecognitionCode1
Implicit Feature Refinement for Instance SegmentationCode1
Comparison of semi-supervised deep learning algorithms for audio classificationCode1
CLoVe: Encoding Compositional Language in Contrastive Vision-Language ModelsCode1
Computing the Testing Error without a Testing SetCode1
Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like?Code1
Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot ClassificationCode1
Learning Counterfactually Invariant PredictorsCode1
Learning Dense Object Descriptors from Multiple Views for Low-shot Category GeneralizationCode1
Billion-scale semi-supervised learning for image classificationCode1
Learning Semi-supervised Gaussian Mixture Models for Generalized Category DiscoveryCode1
BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in VideoCode1
Attribution in Scale and SpaceCode1
A Study of Face Obfuscation in ImageNetCode1
LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object RecognitionCode1
Bilateral Event Mining and Complementary for Event Stream Super-ResolutionCode1
Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image ClassificationCode1
Microsoft COCO: Common Objects in ContextCode1
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual GroundingCode1
Are Convolutional Neural Networks or Transformers more like human vision?Code1
Hebbian learning with gradients: Hebbian convolutional neural networks with modern deep learning frameworksCode1
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense GraphsCode1
Neural Regression, Representational Similarity, Model Zoology & Neural Taskonomy at Scale in Rodent Visual CortexCode1
Computing the Testing Error Without a Testing SetCode1
Offline Meta-Reinforcement Learning with Advantage WeightingCode1
When and how CNNs generalize to out-of-distribution category-viewpoint combinationsCode1
On the Challenges of Open World Recognitionunder Shifting Visual DomainsCode1
ORBIT: A Real-World Few-Shot Dataset for Teachable Object RecognitionCode1
Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group SoftmaxCode1
Deep Subdomain Adaptation Network for Image ClassificationCode1
Part-guided Relational Transformers for Fine-grained Visual RecognitionCode1
3D ShapeNets: A Deep Representation for Volumetric ShapesCode1
FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing ImageryCode1
Show:102550
← PrevPage 4 of 41Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Imagenshape bias98.7Unverified
2Stable Diffusionshape bias92.7Unverified
3Partishape bias91.7Unverified
4ViT-22B-384shape bias86.4Unverified
5ViT-22B-560shape bias83.8Unverified
6CLIP (ViT-B)shape bias79.9Unverified
7ViT-22B-224shape bias78Unverified
8ResNet-50 (L2 eps 5.0 adv trained)shape bias69.5Unverified
9ResNet-50 (with strong augmentations)shape bias62.2Unverified
10SWSL (ResNeXt-101)shape bias49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.55Unverified
2SSNNAccuracy (% )78.57Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.62Unverified
2SSNNAccuracy (% )79.25Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy18.75Unverified
2yunTop 5 Accuracy14.75Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2DYTop 5 Accuracy0.08Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2AJ2021Top 5 Accuracy27.68Unverified
#ModelMetricClaimedVerifiedStatus
1SSNNAccuracy (% )94.91Unverified
#ModelMetricClaimedVerifiedStatus
1Faster-RCNNmAP30.39Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )96Unverified