SOTAVerified

Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Showing 201250 of 2042 papers

TitleStatusHype
Learning what and where to attendCode1
Wavelet Convolutional Neural NetworksCode1
Dynamic Few-Shot Visual Learning without ForgettingCode1
DeepScores -- A Dataset for Segmentation, Detection and Classification of Tiny ObjectsCode1
Generalizable Data-free Objective for Crafting Universal Adversarial PerturbationsCode1
Relation Networks for Object DetectionCode1
Distributed Deep Neural Networks over the Cloud, the Edge and End DevicesCode1
Multiple Instance Detection Network with Online Instance Classifier RefinementCode1
Evolving Deep Neural NetworksCode1
Densely Connected Convolutional NetworksCode1
Deep Predictive Coding Networks for Video Prediction and Unsupervised LearningCode1
Domain Generalization for Object Recognition with Multi-task AutoencodersCode1
Training Deep Neural Networks on Noisy Labels with BootstrappingCode1
Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNetCode1
Going Deeper with ConvolutionsCode1
ImageNet Large Scale Visual Recognition ChallengeCode1
3D ShapeNets: A Deep Representation for Volumetric ShapesCode1
Microsoft COCO: Common Objects in ContextCode1
OverFeat: Integrated Recognition, Localization and Detection using Convolutional NetworksCode1
Describing Textures in the WildCode1
Improving neural networks by preventing co-adaptation of feature detectorsCode1
GeoMag: A Vision-Language Model for Pixel-level Fine-Grained Remote Sensing Image Parsing0
Out-of-distribution detection in 3D applications: a review0
SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point CloudsCode0
Continual Hyperbolic Learning of Instances and Classes0
DCIRNet: Depth Completion with Iterative Refinement for Dexterous Grasping of Transparent and Reflective Objects0
Aligning Text, Images, and 3D Structure Token-by-Token0
Feature-Based Lie Group Transformer for Real-World Applications0
EV-Flying: an Event-based Dataset for In-The-Wild Recognition of Flying Objects0
Explicitly Modeling Subcortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness0
Efficient Estimation of Regularized Tyler's M-Estimator Using Approximate LOOCV0
TrackVLA: Embodied Visual Tracking in the Wild0
SHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail VoxelsCode0
ADD-SLAM: Adaptive Dynamic Dense SLAM with Gaussian Splatting0
Detailed Evaluation of Modern Machine Learning Approaches for Optic Plastics Sorting0
Refining Neural Activation Patterns for Layer-Level Concept Discovery in Neural Network-Based Receivers0
RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation0
PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI0
Model alignment using inter-modal bridges0
ViEEG: Hierarchical Neural Coding with Cross-Modal Progressive Enhancement for EEG-Based Visual Decoding0
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision0
AW-GATCN: Adaptive Weighted Graph Attention Convolutional Network for Event Camera Data Joint Denoising and Object Recognition0
MIRAGE: A Multi-modal Benchmark for Spatial Perception, Reasoning, and Intelligence0
Improving Unsupervised Task-driven Models of Ventral Visual Stream via Relative Position PredictivityCode0
Topology-Guided Knowledge Distillation for Efficient Point Cloud ProcessingCode0
Visually Interpretable Subtask Reasoning for Visual Question AnsweringCode0
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding0
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models0
Transferable Adversarial Attacks on Black-Box Vision-Language Models0
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM0
Show:102550
← PrevPage 5 of 41Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Imagenshape bias98.7Unverified
2Stable Diffusionshape bias92.7Unverified
3Partishape bias91.7Unverified
4ViT-22B-384shape bias86.4Unverified
5ViT-22B-560shape bias83.8Unverified
6CLIP (ViT-B)shape bias79.9Unverified
7ViT-22B-224shape bias78Unverified
8ResNet-50 (L2 eps 5.0 adv trained)shape bias69.5Unverified
9ResNet-50 (with strong augmentations)shape bias62.2Unverified
10SWSL (ResNeXt-101)shape bias49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.55Unverified
2SSNNAccuracy (% )78.57Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.62Unverified
2SSNNAccuracy (% )79.25Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy18.75Unverified
2yunTop 5 Accuracy14.75Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2DYTop 5 Accuracy0.08Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2AJ2021Top 5 Accuracy27.68Unverified
#ModelMetricClaimedVerifiedStatus
1SSNNAccuracy (% )94.91Unverified
#ModelMetricClaimedVerifiedStatus
1Faster-RCNNmAP30.39Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )96Unverified