SOTAVerified

Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Showing 150 of 2042 papers

TitleStatusHype
Lightweight Pixel Difference Networks for Efficient Visual Representation LearningCode4
RTMDet: An Empirical Study of Designing Real-Time Object DetectorsCode4
Detectron2 Object Detection & Manipulating Images using CartoonizationCode4
Interactive Medical Image Segmentation: A Benchmark Dataset and BaselineCode3
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond ScalingCode3
pix2gestalt: Amodal Segmentation by Synthesizing WholesCode3
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation ModelsCode3
Datasets: A Community Library for Natural Language ProcessingCode3
InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object RecognitionCode2
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU SimulationCode2
P2Object: Single Point Supervised Object Detection and Instance SegmentationCode2
NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the WildCode2
MG-LLaVA: Towards Multi-Granularity Visual Instruction TuningCode2
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic ImagesCode2
Is CLIP the main roadblock for fine-grained open-world perception?Code2
Lifting Multi-View Detection and Tracking to the Bird's Eye ViewCode2
Local Feature Matching Using Deep Learning: A SurveyCode2
Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imageryCode2
Towards Language Models That Can See: Computer Vision Through the LENS of Natural LanguageCode2
Roboflow 100: A Rich, Multi-Domain Object Detection BenchmarkCode2
The Equalization Losses: Gradient-Driven Training for Long-tailed Object RecognitionCode2
Patchwork++: Fast and Robust Ground Segmentation Solving Partial Under-Segmentation Using 3D Point CloudCode2
Omni3D: A Large Benchmark and Model for 3D Object Detection in the WildCode2
HAKE: A Knowledge Engine Foundation for Human Activity UnderstandingCode2
A Simple Episodic Linear Probe Improves Visual Recognition in the WildCode2
Learning Transferable Visual Models From Natural Language SupervisionCode2
Sparse R-CNN: End-to-End Object Detection with Learnable ProposalsCode2
A Simple Framework for Contrastive Learning of Visual RepresentationsCode2
SceneGraphNet: Neural Message Passing for 3D Indoor Scene AugmentationCode2
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and BeyondCode2
Hypergraph Neural NetworksCode2
Some Improvements on Deep Convolutional Neural Network Based Image ClassificationCode2
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous DrivingCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal ModelsCode1
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object DetectionCode1
CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal DynamicsCode1
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language ModelCode1
Expanding Event Modality Applications through a Robust CLIP-Based EncoderCode1
LRSAA: Large-scale Remote Sensing Image Target Recognition and Automatic AnnotationCode1
Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot LearningCode1
Large-scale Remote Sensing Image Target Recognition and Automatic AnnotationCode1
MomentumSMoE: Integrating Momentum into Sparse Mixture of ExpertsCode1
DaWin: Training-free Dynamic Weight Interpolation for Robust AdaptationCode1
CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessmentCode1
Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image ClassificationCode1
On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic SurveyCode1
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object DetectionCode1
Dual-Hybrid Attention Network for Specular Highlight RemovalCode1
PartImageNet++ Dataset: Scaling up Part-based Models for Robust RecognitionCode1
Show:102550
← PrevPage 1 of 41Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Imagenshape bias98.7Unverified
2Stable Diffusionshape bias92.7Unverified
3Partishape bias91.7Unverified
4ViT-22B-384shape bias86.4Unverified
5ViT-22B-560shape bias83.8Unverified
6CLIP (ViT-B)shape bias79.9Unverified
7ViT-22B-224shape bias78Unverified
8ResNet-50 (L2 eps 5.0 adv trained)shape bias69.5Unverified
9ResNet-50 (with strong augmentations)shape bias62.2Unverified
10SWSL (ResNeXt-101)shape bias49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.55Unverified
2SSNNAccuracy (% )78.57Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.62Unverified
2SSNNAccuracy (% )79.25Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy18.75Unverified
2yunTop 5 Accuracy14.75Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2DYTop 5 Accuracy0.08Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2AJ2021Top 5 Accuracy27.68Unverified
#ModelMetricClaimedVerifiedStatus
1SSNNAccuracy (% )94.91Unverified
#ModelMetricClaimedVerifiedStatus
1Faster-RCNNmAP30.39Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )96Unverified