SOTAVerified

Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Showing 401450 of 2042 papers

TitleStatusHype
Enhancing Pollinator Conservation towards Agriculture 4.0: Monitoring of Bees through Object RecognitionCode0
Transformer in Touch: A Survey0
BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once0
Zero-shot counting with a dual-stream neural network model0
AIris: An AI-powered Wearable Assistive Device for the Visually Impaired0
ADLDA: A Method to Reduce the Harm of Data Distribution Shift in Data Augmentation0
UnSegGNet: Unsupervised Image Segmentation using Graph Neural NetworksCode0
Probing Human Visual Robustness with Neurally-Guided Deep Neural NetworksCode0
Imagine2touch: Predictive Tactile Sensing for Robotic Manipulation using Efficient Low-Dimensional SignalsCode0
SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients0
Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIMCode0
Deep Models for Multi-View 3D Object Recognition: A Review0
CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction0
ECOR: Explainable CLIP for Object Recognition0
On-board classification of underwater images using hybrid classical-quantum CNN based method0
How to deal with glare for improved perception of Autonomous Vehicles0
Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured0
A Diffusion-based Data Generator for Training Object Recognition Models in Ultra-Range Distance0
A Dataset and Framework for Learning State-invariant Object RepresentationsCode0
MindSet: Vision. A toolbox for testing DNNs on key psychological experimentsCode0
GLCM-Based Feature Combination for Extraction Model Optimization in Object Detection Using Machine Learning0
One Noise to Rule Them All: Multi-View Adversarial Attacks with Universal PerturbationCode0
SUGAR: Pre-training 3D Visual Representations for Robotics0
Object-conditioned Bag of Instances for Few-Shot Personalized Instance Recognition0
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models0
Efficient Multi-Band Temporal Video Filter for Reducing Human-Robot Interaction0
PseudoTouch: Efficiently Imaging the Surface Feel of Objects for Robotic Manipulation0
ParFormer: A Vision Transformer with Parallel Mixer and Sparse Channel Attention Patch Embedding0
EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition0
Improving Robustness to Model Inversion Attacks via Sparse Coding ArchitecturesCode0
Towards Real-Time Fast Unmanned Aerial Vehicle Detection Using Dynamic Vision Sensors0
Latent Object Characteristics Recognition with Visual to Haptic-Audio Cross-modal Transfer Learning0
ViTCN: Vision Transformer Contrastive Network For Reasoning0
MARVIS: Motion & Geometry Aware Real and Virtual Image SegmentationCode0
Don't Judge by the Look: Towards Motion Coherent Video RepresentationCode0
Generalized Relevance Learning Grassmann QuantizationCode0
Learn and Search: An Elegant Technique for Object Lookup using Contrastive Learning0
Mapping High-level Semantic Regions in Indoor Environments without Object Recognition0
Textureless Object Recognition: An Edge-based Approach0
A spatiotemporal style transfer algorithm for dynamic visual stimulus generation0
LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition0
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval0
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model0
DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments0
Probing Multimodal Large Language Models for Global and Local Semantic RepresentationsCode0
ISCUTE: Instance Segmentation of Cables Using Text Embedding0
SpikeNAS: A Fast Memory-Aware Neural Architecture Search Framework for Spiking Neural Network-based Autonomous Agents0
Leveraging Self-Supervised Instance Contrastive Learning for Radar Object Detection0
A Benchmark Grocery Dataset of Realworld Point Clouds From Single View0
Optimizing Sparse Convolution on GPUs with CUDA for 3D Point Cloud Processing in Embedded Systems0
Show:102550
← PrevPage 9 of 41Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Imagenshape bias98.7Unverified
2Stable Diffusionshape bias92.7Unverified
3Partishape bias91.7Unverified
4ViT-22B-384shape bias86.4Unverified
5ViT-22B-560shape bias83.8Unverified
6CLIP (ViT-B)shape bias79.9Unverified
7ViT-22B-224shape bias78Unverified
8ResNet-50 (L2 eps 5.0 adv trained)shape bias69.5Unverified
9ResNet-50 (with strong augmentations)shape bias62.2Unverified
10SWSL (ResNeXt-101)shape bias49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.55Unverified
2SSNNAccuracy (% )78.57Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.62Unverified
2SSNNAccuracy (% )79.25Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy18.75Unverified
2yunTop 5 Accuracy14.75Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2DYTop 5 Accuracy0.08Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2AJ2021Top 5 Accuracy27.68Unverified
#ModelMetricClaimedVerifiedStatus
1SSNNAccuracy (% )94.91Unverified
#ModelMetricClaimedVerifiedStatus
1Faster-RCNNmAP30.39Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )96Unverified