SOTAVerified

Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Showing 150 of 2042 papers

TitleStatusHype
Detectron2 Object Detection & Manipulating Images using CartoonizationCode4
RTMDet: An Empirical Study of Designing Real-Time Object DetectorsCode4
Lightweight Pixel Difference Networks for Efficient Visual Representation LearningCode4
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond ScalingCode3
pix2gestalt: Amodal Segmentation by Synthesizing WholesCode3
Interactive Medical Image Segmentation: A Benchmark Dataset and BaselineCode3
Datasets: A Community Library for Natural Language ProcessingCode3
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation ModelsCode3
NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the WildCode2
Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imageryCode2
InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object RecognitionCode2
SceneGraphNet: Neural Message Passing for 3D Indoor Scene AugmentationCode2
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU SimulationCode2
Lifting Multi-View Detection and Tracking to the Bird's Eye ViewCode2
Sparse R-CNN: End-to-End Object Detection with Learnable ProposalsCode2
The Equalization Losses: Gradient-Driven Training for Long-tailed Object RecognitionCode2
Omni3D: A Large Benchmark and Model for 3D Object Detection in the WildCode2
Local Feature Matching Using Deep Learning: A SurveyCode2
Roboflow 100: A Rich, Multi-Domain Object Detection BenchmarkCode2
Some Improvements on Deep Convolutional Neural Network Based Image ClassificationCode2
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic ImagesCode2
Towards Language Models That Can See: Computer Vision Through the LENS of Natural LanguageCode2
P2Object: Single Point Supervised Object Detection and Instance SegmentationCode2
Is CLIP the main roadblock for fine-grained open-world perception?Code2
A Simple Framework for Contrastive Learning of Visual RepresentationsCode2
A Simple Episodic Linear Probe Improves Visual Recognition in the WildCode2
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and BeyondCode2
Hypergraph Neural NetworksCode2
MG-LLaVA: Towards Multi-Granularity Visual Instruction TuningCode2
Learning Transferable Visual Models From Natural Language SupervisionCode2
Patchwork++: Fast and Robust Ground Segmentation Solving Partial Under-Segmentation Using 3D Point CloudCode2
HAKE: A Knowledge Engine Foundation for Human Activity UnderstandingCode2
Attribution in Scale and SpaceCode1
Discover and Cure: Concept-aware Mitigation of Spurious CorrelationCode1
Distributed Deep Neural Networks over the Cloud, the Edge and End DevicesCode1
A Study of Face Obfuscation in ImageNetCode1
DetMatch: Two Teachers are Better Than One for Joint 2D and 3D Semi-Supervised Object DetectionCode1
Divergences in Color Perception between Deep Neural Networks and HumansCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Densely Connected Convolutional NetworksCode1
Deep Subdomain Adaptation Network for Image ClassificationCode1
DesCo: Learning Object Recognition with Rich Language DescriptionsCode1
Describing Textures in the WildCode1
Do Adversarially Robust ImageNet Models Transfer Better?Code1
DaWin: Training-free Dynamic Weight Interpolation for Robust AdaptationCode1
Debiased Self-Training for Semi-Supervised LearningCode1
Deep Predictive Coding Networks for Video Prediction and Unsupervised LearningCode1
DeepScores -- A Dataset for Segmentation, Detection and Classification of Tiny ObjectsCode1
3D ShapeNets: A Deep Representation for Volumetric ShapesCode1
CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessmentCode1
Show:102550
← PrevPage 1 of 41Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Imagenshape bias98.7Unverified
2Stable Diffusionshape bias92.7Unverified
3Partishape bias91.7Unverified
4ViT-22B-384shape bias86.4Unverified
5ViT-22B-560shape bias83.8Unverified
6CLIP (ViT-B)shape bias79.9Unverified
7ViT-22B-224shape bias78Unverified
8ResNet-50 (L2 eps 5.0 adv trained)shape bias69.5Unverified
9ResNet-50 (with strong augmentations)shape bias62.2Unverified
10SWSL (ResNeXt-101)shape bias49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.55Unverified
2SSNNAccuracy (% )78.57Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )85.62Unverified
2SSNNAccuracy (% )79.25Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy18.75Unverified
2yunTop 5 Accuracy14.75Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2DYTop 5 Accuracy0.08Unverified
#ModelMetricClaimedVerifiedStatus
1ObjectNet-BaselineTop 5 Accuracy52.24Unverified
2AJ2021Top 5 Accuracy27.68Unverified
#ModelMetricClaimedVerifiedStatus
1SSNNAccuracy (% )94.91Unverified
#ModelMetricClaimedVerifiedStatus
1Faster-RCNNmAP30.39Unverified
#ModelMetricClaimedVerifiedStatus
1Spike-VGG11Accuracy (% )96Unverified