Object Recognition

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 2042 papers

Title	Date	Tasks	Status	Hype
MATT-GS: Masked Attention-based 3DGS for Robot Perception and Object Detection	Mar 25, 2025	3DGSobject-detection	—Unverified	0
Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving	Mar 24, 2025	Autonomous DrivingKnowledge Graphs	—Unverified	0
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models	Mar 21, 2025	DiagnosticObject Recognition	—Unverified	0
TULIP: Towards Unified Language-Image Pretraining	Mar 19, 2025	Contrastive LearningData Augmentation	—Unverified	0
Augmenting Image Annotation: A Human-LMM Collaborative Framework for Efficient Object Selection and Label Generation	Mar 14, 2025	Object Recognition	—Unverified	0
OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions	Mar 13, 2025	Object RecognitionSemantic Segmentation	—Unverified	0
Seeing What's Not There: Spurious Correlation in Multimodal LLMs	Mar 11, 2025	HallucinationObject	—Unverified	0
Object-Centric World Model for Language-Guided Manipulation	Mar 8, 2025	Autonomous Drivingmodel	—Unverified	0
Afford-X: Generalizable and Slim Affordance Reasoning for Task-oriented Manipulation	Mar 5, 2025	ObjectObject Recognition	—Unverified	0
Identity documents recognition and detection using semantic segmentation with convolutional neural network	Mar 3, 2025	Object RecognitionSemantic Segmentation	—Unverified	0
Deep learning based infrared small object segmentation: Challenges and future directions	Feb 20, 2025	Autonomous VehiclesObject Recognition	—Unverified	0
RAPTOR: Refined Approach for Product Table Object Recognition	Feb 19, 2025	ObjectObject Recognition	—Unverified	0
Revealing Bias Formation in Deep Neural Networks Through the Geometric Mechanisms of Human Visual Decoupling	Feb 17, 2025	ObjectObject Recognition	—Unverified	0
"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models	Feb 17, 2025	Object RecognitionQuestion Answering	—Unverified	0
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition	Feb 15, 2025	3D Object RecognitionObject Recognition	—Unverified	0
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models	Feb 12, 2025	AttributeDiagnostic	CodeCode Available	1
DCENWCNet: A Deep CNN Ensemble Network for White Blood Cell Classification with LIME-Based Explainability	Feb 8, 2025	Data AugmentationObject Recognition	—Unverified	0
Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics	Jan 26, 2025	Object RecognitionScene Understanding	—Unverified	0
Evaluating Hallucination in Large Vision-Language Models based on Context-Aware Object Similarities	Jan 25, 2025	HallucinationObject	—Unverified	0
NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the Wild	Jan 23, 2025	Earth ObservationObject Recognition	CodeCode Available	2
Development of an Inclusive Educational Platform Using Open Technologies and Machine Learning: A Case Study on Accessibility Enhancement	Jan 22, 2025	Object Recognitionspeech-recognition	—Unverified	0
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression	Jan 21, 2025	Autonomous DrivingObject Recognition	—Unverified	0
AI-Powered Assistive Technologies for Visual Impairment	Jan 14, 2025	Object Recognitiontext-to-speech	—Unverified	0
Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time	Jan 14, 2025	Object RecognitionText Generation	—Unverified	0
Guided SAM: Label-Efficient Part Segmentation	Jan 13, 2025	ObjectObject Recognition	—Unverified	0
Hierarchical Superpixel Segmentation via Structural Information Theory	Jan 13, 2025	graph constructiongraph partitioning	CodeCode Available	0
Perceptual Inductive Bias Is What You Need Before Contrastive Learning	Jan 1, 2025	Contrastive LearningDepth Estimation	—Unverified	0
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models	Jan 1, 2025	AttributeDiagnostic	—Unverified	0
Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering	Dec 30, 2024	Image CaptioningObject Recognition	—Unverified	0
Sample Correlation for Fingerprinting Deep Face Recognition	Dec 30, 2024	Adversarial DefenseEmotion Recognition	CodeCode Available	0
AI-based Wearable Vision Assistance System for the Visually Impaired: Integrating Real-Time Object Recognition and Contextual Understanding Using Large Vision-Language Models	Dec 28, 2024	Object RecognitionRaspberry Pi 4	—Unverified	0
The same but different: impact of animal facility sanitary status on a transgenic mouse model of Alzheimer's disease	Dec 24, 2024	Object Recognition	—Unverified	0
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection	Dec 23, 2024	object-detectionObject Detection	CodeCode Available	1
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization	Dec 21, 2024	Image CaptioningMultimodal Reasoning	CodeCode Available	0
Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition	Dec 18, 2024	AttributeDescriptive	CodeCode Available	0
Targeted View-Invariant Adversarial Perturbations for 3D Object Recognition	Dec 17, 2024	3D Object RecognitionAdversarial Robustness	CodeCode Available	0
Efficient Oriented Object Detection with Enhanced Small Object Recognition in Aerial Images	Dec 17, 2024	Computational EfficiencyObject	—Unverified	0
CREST: An Efficient Conjointly-trained Spike-driven Framework for Event-based Object Detection Exploiting Spatiotemporal Dynamics	Dec 17, 2024	Objectobject-detection	CodeCode Available	1
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model	Dec 13, 2024	Autonomous DrivingDecision Making	CodeCode Available	1
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs	Dec 11, 2024	Large Language ModelObject	—Unverified	0
Proactive Adversarial Defense: Harnessing Prompt Tuning in Vision-Language Models to Detect Unseen Backdoored Images	Dec 11, 2024	Adversarial Defensebackdoor defense	—Unverified	0
Enhancing 3D Object Detection in Autonomous Vehicles Based on Synthetic Virtual Environment Analysis	Dec 10, 2024	2D Object Detection3D Object Detection	—Unverified	0
Can foundation models actively gather information in interactive environments to test hypotheses?	Dec 9, 2024	Object Recognition	—Unverified	0
Expanding Event Modality Applications through a Robust CLIP-Based Encoder	Dec 4, 2024	Few-Shot LearningObject Recognition	CodeCode Available	1
Optimized CNNs for Rapid 3D Point Cloud Object Recognition	Dec 3, 2024	Computational Efficiencyobject-detection	—Unverified	0
LVLM-COUNT: Enhancing the Counting Ability of Large Vision-Language Models	Dec 1, 2024	Object Recognition	CodeCode Available	0
Textured As-Is BIM via GIS-informed Point Cloud Segmentation	Nov 28, 2024	Object RecognitionPoint Cloud Segmentation	—Unverified	0
Verbalized Representation Learning for Interpretable Few-Shot Generalization	Nov 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents	Nov 27, 2024	Autonomous NavigationObject Recognition	CodeCode Available	0
NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?	Nov 26, 2024	AttributeMultiple-choice	—Unverified	0

Show:10 25 50

← PrevPage 2 of 41Next →

All datasets shape bias CIFAR10-DVS N-Caltech 101 ObjectNet (All classes)ObjectNet (ImageNet classes)ObjectNet (ImageNet classes, trained on ImageNet)DVS128 Gesture MECCANO N-CARS

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Imagen	shape bias	98.7	—	Unverified
2	Stable Diffusion	shape bias	92.7	—	Unverified
3	Parti	shape bias	91.7	—	Unverified
4	ViT-22B-384	shape bias	86.4	—	Unverified
5	ViT-22B-560	shape bias	83.8	—	Unverified
6	CLIP (ViT-B)	shape bias	79.9	—	Unverified
7	ViT-22B-224	shape bias	78	—	Unverified
8	ResNet-50 (L2 eps 5.0 adv trained)	shape bias	69.5	—	Unverified
9	ResNet-50 (with strong augmentations)	shape bias	62.2	—	Unverified
10	SWSL (ResNeXt-101)	shape bias	49.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Spike-VGG11	Accuracy (% )	85.55	—	Unverified
2	SSNN	Accuracy (% )	78.57	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Spike-VGG11	Accuracy (% )	85.62	—	Unverified
2	SSNN	Accuracy (% )	79.25	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ObjectNet-Baseline	Top 5 Accuracy	18.75	—	Unverified
2	yun	Top 5 Accuracy	14.75	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ObjectNet-Baseline	Top 5 Accuracy	52.24	—	Unverified
2	DY	Top 5 Accuracy	0.08	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ObjectNet-Baseline	Top 5 Accuracy	52.24	—	Unverified
2	AJ2021	Top 5 Accuracy	27.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SSNN	Accuracy (% )	94.91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Faster-RCNN	mAP	30.39	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Spike-VGG11	Accuracy (% )	96	—	Unverified