SOTAVerified

Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Showing 150 of 2042 papers

TitleStatusHype
GeoMag: A Vision-Language Model for Pixel-level Fine-Grained Remote Sensing Image Parsing0
Out-of-distribution detection in 3D applications: a review0
SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point CloudsCode0
Continual Hyperbolic Learning of Instances and Classes0
DCIRNet: Depth Completion with Iterative Refinement for Dexterous Grasping of Transparent and Reflective Objects0
Aligning Text, Images, and 3D Structure Token-by-Token0
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous DrivingCode1
Feature-Based Lie Group Transformer for Real-World Applications0
EV-Flying: an Event-based Dataset for In-The-Wild Recognition of Flying Objects0
Explicitly Modeling Subcortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness0
Efficient Estimation of Regularized Tyler's M-Estimator Using Approximate LOOCV0
TrackVLA: Embodied Visual Tracking in the Wild0
SHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail VoxelsCode0
ADD-SLAM: Adaptive Dynamic Dense SLAM with Gaussian Splatting0
Detailed Evaluation of Modern Machine Learning Approaches for Optic Plastics Sorting0
RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation0
InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object RecognitionCode2
Refining Neural Activation Patterns for Layer-Level Concept Discovery in Neural Network-Based Receivers0
PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI0
ViEEG: Hierarchical Neural Coding with Cross-Modal Progressive Enhancement for EEG-Based Visual Decoding0
Model alignment using inter-modal bridges0
AW-GATCN: Adaptive Weighted Graph Attention Convolutional Network for Event Camera Data Joint Denoising and Object Recognition0
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision0
MIRAGE: A Multi-modal Benchmark for Spatial Perception, Reasoning, and Intelligence0
Improving Unsupervised Task-driven Models of Ventral Visual Stream via Relative Position PredictivityCode0
Topology-Guided Knowledge Distillation for Efficient Point Cloud ProcessingCode0
Visually Interpretable Subtask Reasoning for Visual Question AnsweringCode0
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding0
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models0
Transferable Adversarial Attacks on Black-Box Vision-Language Models0
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM0
LM-MCVT: A Lightweight Multi-modal Multi-view Convolutional-Vision Transformer Approach for 3D Object Recognition0
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Disaggregated Deep Learning via In-Physics Computing at Radio Frequency0
V^2R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations0
Naturally Computed Scale Invariance in the Residual Stream of ResNet18Code0
Quantum Doubly Stochastic Transformers0
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU SimulationCode2
DVLTA-VQA: Decoupled Vision-Language Modeling with Text-Guided Adaptation for Blind Video Quality Assessment0
Visual Language Models show widespread visual deficits on neuropsychological tests0
MASSeg : 2nd Technical Report for 4th PVUW MOSE TrackCode0
Hardware, Algorithms, and Applications of the Neuromorphic Vision Sensor: a Review0
P2Object: Single Point Supervised Object Detection and Instance SegmentationCode2
D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition0
Advancing Egocentric Video Question Answering with Multimodal Large Language Models0
ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection0
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users0
Foveated Instance SegmentationCode0
DuckSegmentation: A segmentation model based on the AnYue Hemp Duck Dataset0
Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection0
Show:102550
← PrevPage 1 of 41Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Imagenshape bias98.7Unverified
2Stable Diffusionshape bias92.7Unverified
3Partishape bias91.7Unverified
4ViT-22B-384shape bias86.4Unverified
5ViT-22B-560shape bias83.8Unverified
6CLIP (ViT-B)shape bias79.9Unverified
7ViT-22B-224shape bias78Unverified
8ResNet-50 (L2 eps 5.0 adv trained)shape bias69.5Unverified
9ResNet-50 (with strong augmentations)shape bias62.2Unverified
10SWSL (ResNeXt-101)shape bias49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.55Unverified
2SSNNAccuracy (% )78.57Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.62Unverified
2SSNNAccuracy (% )79.25Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy18.75Unverified
2yunTop 5 Accuracy14.75Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2DYTop 5 Accuracy0.08Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2AJ2021Top 5 Accuracy27.68Unverified
#ModelMetricClaimedVerifiedStatus
1SSNNAccuracy (% )94.91Unverified
#ModelMetricClaimedVerifiedStatus
1Faster-RCNNmAP30.39Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )96Unverified