Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 2042 papers

Title	Date	Tasks	Status	Hype
Lightweight Pixel Difference Networks for Efficient Visual Representation Learning	Feb 1, 2024	Edge DetectionObject Recognition	CodeCode Available	4
RTMDet: An Empirical Study of Designing Real-Time Object Detectors	Dec 14, 2022	GPUInstance Segmentation	CodeCode Available	4
Detectron2 Object Detection & Manipulating Images using Cartoonization	Aug 1, 2021	Autonomous VehiclesData Visualization	CodeCode Available	4
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline	Nov 19, 2024	Image SegmentationInteractive Segmentation	CodeCode Available	3
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling	Aug 9, 2024	GPULanguage Modeling	CodeCode Available	3
pix2gestalt: Amodal Segmentation by Synthesizing Wholes	Jan 25, 2024	3D ReconstructionObject Recognition	CodeCode Available	3
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models	Feb 8, 2022	DiagnosticImage Captioning	CodeCode Available	3
Datasets: A Community Library for Natural Language Processing	Sep 7, 2021	Image ClassificationObject Recognition	CodeCode Available	3
InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition	May 21, 2025	Earth ObservationObject	CodeCode Available	2
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation	Apr 17, 2025	GPUObject Recognition	CodeCode Available	2
P2Object: Single Point Supervised Object Detection and Instance Segmentation	Apr 10, 2025	Instance SegmentationMultiple Instance Learning	CodeCode Available	2
NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the Wild	Jan 23, 2025	Earth ObservationObject Recognition	CodeCode Available	2
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning	Jun 25, 2024	ObjectObject Recognition	CodeCode Available	2
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images	Jun 19, 2024	Object RecognitionScene Understanding	CodeCode Available	2
Is CLIP the main roadblock for fine-grained open-world perception?	Apr 4, 2024	Autonomous DrivingNovel Concepts	CodeCode Available	2
Lifting Multi-View Detection and Tracking to the Bird's Eye View	Mar 19, 2024	3D Object RecognitionMulti-Object Tracking	CodeCode Available	2
Local Feature Matching Using Deep Learning: A Survey	Jan 31, 2024	3D ReconstructionDeep Learning	CodeCode Available	2
Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery	Jan 12, 2024	Object RecognitionRoad Segmentation	CodeCode Available	2
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language	Jun 28, 2023	DescriptiveLanguage Modeling	CodeCode Available	2
Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark	Nov 24, 2022	2D Object DetectionImage Retrieval	CodeCode Available	2
The Equalization Losses: Gradient-Driven Training for Long-tailed Object Recognition	Oct 11, 2022	image-classificationImage Classification	CodeCode Available	2
Patchwork++: Fast and Robust Ground Segmentation Solving Partial Under-Segmentation Using 3D Point Cloud	Jul 25, 2022	Object RecognitionSegmentation	CodeCode Available	2
Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild	Jul 21, 2022	3D Object Detection3D Object Detection From Monocular Images	CodeCode Available	2
HAKE: A Knowledge Engine Foundation for Human Activity Understanding	Feb 14, 2022	Action RecognitionHuman-Object Interaction Detection	CodeCode Available	2
A Simple Episodic Linear Probe Improves Visual Recognition in the Wild	Jan 1, 2022	Fine-Grained Image ClassificationImage Classification	CodeCode Available	2
Learning Transferable Visual Models From Natural Language Supervision	Feb 26, 2021	Action RecognitionBenchmarking	CodeCode Available	2
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals	Nov 25, 2020	2D Object DetectionObject	CodeCode Available	2
A Simple Framework for Contrastive Learning of Visual Representations	Feb 13, 2020	Contrastive LearningImage Classification	CodeCode Available	2
SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation	Jul 25, 2019	3D Object RecognitionObject Recognition	CodeCode Available	2
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond	Apr 25, 2019	Instance SegmentationObject Detection	CodeCode Available	2
Hypergraph Neural Networks	Sep 25, 2018	Object RecognitionRepresentation Learning	CodeCode Available	2
Some Improvements on Deep Convolutional Neural Network Based Image Classification	Dec 19, 2013	ClassificationGeneral Classification	CodeCode Available	2
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving	Jun 6, 2025	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency	Apr 24, 2025	BenchmarkingMath	CodeCode Available	1
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models	Feb 12, 2025	AttributeDiagnostic	CodeCode Available	1
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection	Dec 23, 2024	object-detectionObject Detection	CodeCode Available	1
CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal Dynamics	Dec 17, 2024	Objectobject-detection	CodeCode Available	1
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model	Dec 13, 2024	Autonomous DrivingDecision Making	CodeCode Available	1
Expanding Event Modality Applications through a Robust CLIP-Based Encoder	Dec 4, 2024	Few-Shot LearningObject Recognition	CodeCode Available	1
LRSAA: Large-scale Remote Sensing Image Target Recognition and Automatic Annotation	Nov 24, 2024	Ensemble LearningObject	CodeCode Available	1
Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning	Nov 18, 2024	AttributeCompositional Zero-Shot Learning	CodeCode Available	1
Large-scale Remote Sensing Image Target Recognition and Automatic Annotation	Nov 12, 2024	Ensemble LearningObject	CodeCode Available	1
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts	Oct 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation	Oct 3, 2024	Multi-Task LearningObject Recognition	CodeCode Available	1
CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment	Oct 2, 2024	AstronomyImage Quality Assessment	CodeCode Available	1
Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image Classification	Aug 15, 2024	image-classificationImage Classification	CodeCode Available	1
On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey	Aug 9, 2024	Object Recognition	CodeCode Available	1
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection	Jul 31, 2024	Language ModellingObject	CodeCode Available	1
Dual-Hybrid Attention Network for Specular Highlight Removal	Jul 17, 2024	highlight removalObject Recognition	CodeCode Available	1
PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition	Jul 15, 2024	Adversarial RobustnessInductive Bias	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 41Next →

All datasets shape bias CIFAR10-DVS N-Caltech 101 ObjectNet (All classes)ObjectNet (ImageNet classes)ObjectNet (ImageNet classes, trained on ImageNet)DVS128 Gesture MECCANO N-CARS

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Imagen	shape bias	98.7	—	Unverified
2	Stable Diffusion	shape bias	92.7	—	Unverified
3	Parti	shape bias	91.7	—	Unverified
4	ViT-22B-384	shape bias	86.4	—	Unverified
5	ViT-22B-560	shape bias	83.8	—	Unverified
6	CLIP (ViT-B)	shape bias	79.9	—	Unverified
7	ViT-22B-224	shape bias	78	—	Unverified
8	ResNet-50 (L2 eps 5.0 adv trained)	shape bias	69.5	—	Unverified
9	ResNet-50 (with strong augmentations)	shape bias	62.2	—	Unverified
10	SWSL (ResNeXt-101)	shape bias	49.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Spike-VGG11	Accuracy (% )	85.55	—	Unverified
2	SSNN	Accuracy (% )	78.57	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Spike-VGG11	Accuracy (% )	85.62	—	Unverified
2	SSNN	Accuracy (% )	79.25	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ObjectNet-Baseline	Top 5 Accuracy	18.75	—	Unverified
2	yun	Top 5 Accuracy	14.75	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ObjectNet-Baseline	Top 5 Accuracy	52.24	—	Unverified
2	DY	Top 5 Accuracy	0.08	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ObjectNet-Baseline	Top 5 Accuracy	52.24	—	Unverified
2	AJ2021	Top 5 Accuracy	27.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SSNN	Accuracy (% )	94.91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Faster-RCNN	mAP	30.39	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Spike-VGG11	Accuracy (% )	96	—	Unverified