SOTAVerified

Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Showing 150 of 2042 papers

TitleStatusHype
Detectron2 Object Detection & Manipulating Images using CartoonizationCode4
RTMDet: An Empirical Study of Designing Real-Time Object DetectorsCode4
Lightweight Pixel Difference Networks for Efficient Visual Representation LearningCode4
pix2gestalt: Amodal Segmentation by Synthesizing WholesCode3
Interactive Medical Image Segmentation: A Benchmark Dataset and BaselineCode3
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond ScalingCode3
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation ModelsCode3
Datasets: A Community Library for Natural Language ProcessingCode3
A Simple Framework for Contrastive Learning of Visual RepresentationsCode2
Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imageryCode2
P2Object: Single Point Supervised Object Detection and Instance SegmentationCode2
A Simple Episodic Linear Probe Improves Visual Recognition in the WildCode2
Local Feature Matching Using Deep Learning: A SurveyCode2
Is CLIP the main roadblock for fine-grained open-world perception?Code2
The Equalization Losses: Gradient-Driven Training for Long-tailed Object RecognitionCode2
SceneGraphNet: Neural Message Passing for 3D Indoor Scene AugmentationCode2
Towards Language Models That Can See: Computer Vision Through the LENS of Natural LanguageCode2
MG-LLaVA: Towards Multi-Granularity Visual Instruction TuningCode2
Patchwork++: Fast and Robust Ground Segmentation Solving Partial Under-Segmentation Using 3D Point CloudCode2
Roboflow 100: A Rich, Multi-Domain Object Detection BenchmarkCode2
Some Improvements on Deep Convolutional Neural Network Based Image ClassificationCode2
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU SimulationCode2
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic ImagesCode2
Sparse R-CNN: End-to-End Object Detection with Learnable ProposalsCode2
Lifting Multi-View Detection and Tracking to the Bird's Eye ViewCode2
NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the WildCode2
InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object RecognitionCode2
HAKE: A Knowledge Engine Foundation for Human Activity UnderstandingCode2
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and BeyondCode2
Learning Transferable Visual Models From Natural Language SupervisionCode2
Hypergraph Neural NetworksCode2
Omni3D: A Large Benchmark and Model for 3D Object Detection in the WildCode2
Discover and Cure: Concept-aware Mitigation of Spurious CorrelationCode1
DetMatch: Two Teachers are Better Than One for Joint 2D and 3D Semi-Supervised Object DetectionCode1
Distributed Deep Neural Networks over the Cloud, the Edge and End DevicesCode1
Describing Textures in the WildCode1
DesCo: Learning Object Recognition with Rich Language DescriptionsCode1
Divergences in Color Perception between Deep Neural Networks and HumansCode1
DeepScores -- A Dataset for Segmentation, Detection and Classification of Tiny ObjectsCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Deep Subdomain Adaptation Network for Image ClassificationCode1
Deep Learning for Event-based Vision: A Comprehensive Survey and BenchmarksCode1
Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNetCode1
Deep Predictive Coding Networks for Video Prediction and Unsupervised LearningCode1
Densely Connected Convolutional NetworksCode1
Do Adversarially Robust ImageNet Models Transfer Better?Code1
Debiased Self-Training for Semi-Supervised LearningCode1
Decoding Natural Images from EEG for Object RecognitionCode1
3D ShapeNets: A Deep Representation for Volumetric ShapesCode1
CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal DynamicsCode1
Show:102550
← PrevPage 1 of 41Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Imagenshape bias98.7Unverified
2Stable Diffusionshape bias92.7Unverified
3Partishape bias91.7Unverified
4ViT-22B-384shape bias86.4Unverified
5ViT-22B-560shape bias83.8Unverified
6CLIP (ViT-B)shape bias79.9Unverified
7ViT-22B-224shape bias78Unverified
8ResNet-50 (L2 eps 5.0 adv trained)shape bias69.5Unverified
9ResNet-50 (with strong augmentations)shape bias62.2Unverified
10SWSL (ResNeXt-101)shape bias49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.55Unverified
2SSNNAccuracy (% )78.57Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.62Unverified
2SSNNAccuracy (% )79.25Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy18.75Unverified
2yunTop 5 Accuracy14.75Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2DYTop 5 Accuracy0.08Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2AJ2021Top 5 Accuracy27.68Unverified
#ModelMetricClaimedVerifiedStatus
1SSNNAccuracy (% )94.91Unverified
#ModelMetricClaimedVerifiedStatus
1Faster-RCNNmAP30.39Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )96Unverified