SOTAVerified

Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Showing 251300 of 2042 papers

TitleStatusHype
LM-MCVT: A Lightweight Multi-modal Multi-view Convolutional-Vision Transformer Approach for 3D Object Recognition0
Disaggregated Deep Learning via In-Physics Computing at Radio Frequency0
V^2R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations0
Naturally Computed Scale Invariance in the Residual Stream of ResNet18Code0
Quantum Doubly Stochastic Transformers0
DVLTA-VQA: Decoupled Vision-Language Modeling with Text-Guided Adaptation for Blind Video Quality Assessment0
Visual Language Models show widespread visual deficits on neuropsychological tests0
MASSeg : 2nd Technical Report for 4th PVUW MOSE TrackCode0
Hardware, Algorithms, and Applications of the Neuromorphic Vision Sensor: a Review0
D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition0
Advancing Egocentric Video Question Answering with Multimodal Large Language Models0
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users0
ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection0
Foveated Instance SegmentationCode0
DuckSegmentation: A segmentation model based on the AnYue Hemp Duck Dataset0
Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection0
MATT-GS: Masked Attention-based 3DGS for Robot Perception and Object Detection0
Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving0
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models0
TULIP: Towards Unified Language-Image Pretraining0
Augmenting Image Annotation: A Human-LMM Collaborative Framework for Efficient Object Selection and Label Generation0
OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions0
Seeing What's Not There: Spurious Correlation in Multimodal LLMs0
Object-Centric World Model for Language-Guided Manipulation0
Afford-X: Generalizable and Slim Affordance Reasoning for Task-oriented Manipulation0
Identity documents recognition and detection using semantic segmentation with convolutional neural network0
Deep learning based infrared small object segmentation: Challenges and future directions0
RAPTOR: Refined Approach for Product Table Object Recognition0
"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models0
Revealing Bias Formation in Deep Neural Networks Through the Geometric Mechanisms of Human Visual Decoupling0
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition0
DCENWCNet: A Deep CNN Ensemble Network for White Blood Cell Classification with LIME-Based Explainability0
Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics0
Evaluating Hallucination in Large Vision-Language Models based on Context-Aware Object Similarities0
Development of an Inclusive Educational Platform Using Open Technologies and Machine Learning: A Case Study on Accessibility Enhancement0
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression0
AI-Powered Assistive Technologies for Visual Impairment0
Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time0
Guided SAM: Label-Efficient Part Segmentation0
Hierarchical Superpixel Segmentation via Structural Information TheoryCode0
Perceptual Inductive Bias Is What You Need Before Contrastive Learning0
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models0
Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering0
Sample Correlation for Fingerprinting Deep Face RecognitionCode0
AI-based Wearable Vision Assistance System for the Visually Impaired: Integrating Real-Time Object Recognition and Contextual Understanding Using Large Vision-Language Models0
The same but different: impact of animal facility sanitary status on a transgenic mouse model of Alzheimer's disease0
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object LocalizationCode0
Real Classification by Description: Extending CLIP's Limits of Part Attributes RecognitionCode0
Targeted View-Invariant Adversarial Perturbations for 3D Object RecognitionCode0
Efficient Oriented Object Detection with Enhanced Small Object Recognition in Aerial Images0
Show:102550
← PrevPage 6 of 41Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Imagenshape bias98.7Unverified
2Stable Diffusionshape bias92.7Unverified
3Partishape bias91.7Unverified
4ViT-22B-384shape bias86.4Unverified
5ViT-22B-560shape bias83.8Unverified
6CLIP (ViT-B)shape bias79.9Unverified
7ViT-22B-224shape bias78Unverified
8ResNet-50 (L2 eps 5.0 adv trained)shape bias69.5Unverified
9ResNet-50 (with strong augmentations)shape bias62.2Unverified
10SWSL (ResNeXt-101)shape bias49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.55Unverified
2SSNNAccuracy (% )78.57Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.62Unverified
2SSNNAccuracy (% )79.25Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy18.75Unverified
2yunTop 5 Accuracy14.75Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2DYTop 5 Accuracy0.08Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2AJ2021Top 5 Accuracy27.68Unverified
#ModelMetricClaimedVerifiedStatus
1SSNNAccuracy (% )94.91Unverified
#ModelMetricClaimedVerifiedStatus
1Faster-RCNNmAP30.39Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )96Unverified