SOTAVerified

Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Showing 201250 of 2042 papers

TitleStatusHype
ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition0
Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition0
Haptic in-sensor computing device made of carbon nanotube-polydimethylsiloxane nanocomposites0
The 3D-PC: a benchmark for visual perspective taking in humans and machinesCode1
A Review of Pulse-Coupled Neural Network Applications in Computer Vision and Image Processing0
Face processing emerges from object-trained convolutional neural networks0
MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding0
Enhancing Pollinator Conservation towards Agriculture 4.0: Monitoring of Bees through Object RecognitionCode0
Transformer in Touch: A Survey0
BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once0
Zero-shot counting with a dual-stream neural network model0
Bilateral Event Mining and Complementary for Event Stream Super-ResolutionCode1
AIris: An AI-powered Wearable Assistive Device for the Visually Impaired0
ADLDA: A Method to Reduce the Harm of Data Distribution Shift in Data Augmentation0
UnSegGNet: Unsupervised Image Segmentation using Graph Neural NetworksCode0
Probing Human Visual Robustness with Neurally-Guided Deep Neural NetworksCode0
Imagine2touch: Predictive Tactile Sensing for Robotic Manipulation using Efficient Low-Dimensional SignalsCode0
SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients0
Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIMCode0
Deep Models for Multi-View 3D Object Recognition: A Review0
CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction0
On-board classification of underwater images using hybrid classical-quantum CNN based method0
ECOR: Explainable CLIP for Object Recognition0
Exploring the Transferability of Visual Prompting for Multimodal Large Language ModelsCode1
How to deal with glare for improved perception of Autonomous Vehicles0
Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured0
A Diffusion-based Data Generator for Training Object Recognition Models in Ultra-Range Distance0
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language ModelsCode1
A Dataset and Framework for Learning State-invariant Object RepresentationsCode0
MindSet: Vision. A toolbox for testing DNNs on key psychological experimentsCode0
GLCM-Based Feature Combination for Extraction Model Optimization in Object Detection Using Machine Learning0
Is CLIP the main roadblock for fine-grained open-world perception?Code2
One Noise to Rule Them All: Multi-View Adversarial Attacks with Universal PerturbationCode0
Object-conditioned Bag of Instances for Few-Shot Personalized Instance Recognition0
SUGAR: Pre-training 3D Visual Representations for Robotics0
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models0
Efficient Multi-Band Temporal Video Filter for Reducing Human-Robot Interaction0
PseudoTouch: Efficiently Imaging the Surface Feel of Objects for Robotic Manipulation0
ParFormer: A Vision Transformer with Parallel Mixer and Sparse Channel Attention Patch Embedding0
Improving Robustness to Model Inversion Attacks via Sparse Coding ArchitecturesCode0
EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition0
Lifting Multi-View Detection and Tracking to the Bird's Eye ViewCode2
Towards Real-Time Fast Unmanned Aerial Vehicle Detection Using Dynamic Vision Sensors0
Latent Object Characteristics Recognition with Visual to Haptic-Audio Cross-modal Transfer Learning0
ViTCN: Vision Transformer Contrastive Network For Reasoning0
MARVIS: Motion & Geometry Aware Real and Virtual Image SegmentationCode0
Don't Judge by the Look: Towards Motion Coherent Video RepresentationCode0
Generalized Relevance Learning Grassmann QuantizationCode0
EventRPG: Event Data Augmentation with Relevance Propagation GuidanceCode1
Learn and Search: An Elegant Technique for Object Lookup using Contrastive Learning0
Show:102550
← PrevPage 5 of 41Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Imagenshape bias98.7Unverified
2Stable Diffusionshape bias92.7Unverified
3Partishape bias91.7Unverified
4ViT-22B-384shape bias86.4Unverified
5ViT-22B-560shape bias83.8Unverified
6CLIP (ViT-B)shape bias79.9Unverified
7ViT-22B-224shape bias78Unverified
8ResNet-50 (L2 eps 5.0 adv trained)shape bias69.5Unverified
9ResNet-50 (with strong augmentations)shape bias62.2Unverified
10SWSL (ResNeXt-101)shape bias49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.55Unverified
2SSNNAccuracy (% )78.57Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.62Unverified
2SSNNAccuracy (% )79.25Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy18.75Unverified
2yunTop 5 Accuracy14.75Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2DYTop 5 Accuracy0.08Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2AJ2021Top 5 Accuracy27.68Unverified
#ModelMetricClaimedVerifiedStatus
1SSNNAccuracy (% )94.91Unverified
#ModelMetricClaimedVerifiedStatus
1Faster-RCNNmAP30.39Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )96Unverified