SOTAVerified

Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Showing 201250 of 2042 papers

TitleStatusHype
Dual-Hybrid Attention Network for Specular Highlight RemovalCode1
Doubly Right Object Recognition: A Why Prompt for Visual RationalesCode1
Bilateral Event Mining and Complementary for Event Stream Super-ResolutionCode1
Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning modelsCode1
E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation LearningCode1
TactileSGNet: A Spiking Graph Neural Network for Event-based Tactile Object RecognitionCode1
ImageNet Large Scale Visual Recognition ChallengeCode1
Equalization Loss for Long-Tailed Object RecognitionCode1
Enriching ImageNet with Human Similarity Judgments and Psychological EmbeddingsCode1
Improving neural networks by preventing co-adaptation of feature detectorsCode1
Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot LearningCode1
EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge DistillationCode1
Event-based Asynchronous Sparse Convolutional NetworksCode1
EventCLIP: Adapting CLIP for Event-based Object RecognitionCode1
Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking ConsistencyCode1
Expanding Event Modality Applications through a Robust CLIP-Based EncoderCode1
Explainability-Aware One Point Attack for Point Cloud Neural NetworksCode1
The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by NormalizationCode1
Noise or Signal: The Role of Image Backgrounds in Object RecognitionCode1
A Study of Face Obfuscation in ImageNetCode1
Exploring the Transferability of Visual Prompting for Multimodal Large Language ModelsCode1
Human Eyes Inspired Recurrent Neural Networks are More Robust Against Adversarial NoisesCode0
A Dataset for Crucial Object Recognition in Blind and Low-Vision Individuals' NavigationCode0
Human-like Clustering with Deep Convolutional Neural NetworksCode0
A Multi-viewpoint Outdoor Dataset for Human Action RecognitionCode0
Hierarchical Superpixel Segmentation via Structural Information TheoryCode0
How much human-like visual experience do current self-supervised learning algorithms need in order to achieve human-level object recognition?Code0
Human Pose Estimation for Real-World Crowded ScenariosCode0
Handwritten Bangla Character Recognition Using The State-of-Art Deep Convolutional Neural NetworksCode0
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agentsCode0
Grid Cell Path Integration For Movement-Based Visual Object RecognitionCode0
Global Second-order Pooling Convolutional NetworksCode0
Geometry-Based Region Proposals for Real-Time Robot Detection of Tabletop ObjectsCode0
Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes ProsthesisCode0
HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual RecognitionCode0
A comparison between humans and AI at recognizing objects in unusual posesCode0
Generalisation in humans and deep neural networksCode0
Ambient Sound Provides Supervision for Visual LearningCode0
Optimizing Spatio-Temporal Information Processing in Spiking Neural Networks via Unconstrained Leaky Integrate-and-Fire Neurons and Hybrid CodingCode0
Geometric and Textural Augmentation for Domain Gap ReductionCode0
GAANet: Ghost Auto Anchor Network for Detecting Varying Size Drones in DarkCode0
Generalized Relevance Learning Grassmann QuantizationCode0
Foveation in the Era of Deep LearningCode0
Foveated Instance SegmentationCode0
Grounded Human-Object Interaction Hotspots from VideoCode0
FPNN: Field Probing Neural Networks for 3D DataCode0
Generate To Adapt: Aligning Domains using Generative Adversarial NetworksCode0
Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language ModelsCode0
Improving Out-of-Distribution Detection with Disentangled Foreground and Background FeaturesCode0
Attention Based Pruning for Shift NetworksCode0
Show:102550
← PrevPage 5 of 41Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Imagenshape bias98.7Unverified
2Stable Diffusionshape bias92.7Unverified
3Partishape bias91.7Unverified
4ViT-22B-384shape bias86.4Unverified
5ViT-22B-560shape bias83.8Unverified
6CLIP (ViT-B)shape bias79.9Unverified
7ViT-22B-224shape bias78Unverified
8ResNet-50 (L2 eps 5.0 adv trained)shape bias69.5Unverified
9ResNet-50 (with strong augmentations)shape bias62.2Unverified
10SWSL (ResNeXt-101)shape bias49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.55Unverified
2SSNNAccuracy (% )78.57Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.62Unverified
2SSNNAccuracy (% )79.25Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy18.75Unverified
2yunTop 5 Accuracy14.75Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2DYTop 5 Accuracy0.08Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2AJ2021Top 5 Accuracy27.68Unverified
#ModelMetricClaimedVerifiedStatus
1SSNNAccuracy (% )94.91Unverified
#ModelMetricClaimedVerifiedStatus
1Faster-RCNNmAP30.39Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )96Unverified