Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 276–300 of 1723 papers

Title	Date	Tasks	Status	Hype	Score
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations	Dec 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise Binarization	Jul 22, 2022	3D Instance Segmentation3D Object Detection	CodeCode Available	1	5
Dynamic Graph Message Passing Networks for Visual Recognition	Sep 20, 2022	image-classificationImage Classification	CodeCode Available	1	5
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics	Apr 30, 2025	In-Context LearningObject	CodeCode Available	1	5
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection	Jul 13, 2019	3D Object Detection3D Object Detection From Monocular Images	CodeCode Available	1	5
Microsoft COCO: Common Objects in Context	May 1, 2014	Instance SegmentationObject	CodeCode Available	1	5
Deep learning for radar data exploitation of autonomous vehicle	Mar 15, 2022	Autonomous DrivingDeep Learning	CodeCode Available	1	5
A Survey on Deep Learning Technique for Video Segmentation	Jul 2, 2021	Autonomous DrivingDeep Learning	CodeCode Available	1	5
Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding	Sep 12, 2022	Common Sense ReasoningScene Classification	CodeCode Available	1	5
4D Panoptic LiDAR Segmentation	Feb 24, 2021	4D Panoptic SegmentationBenchmarking	CodeCode Available	1	5
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization	Aug 24, 2021	DiversityGraph Neural Network	CodeCode Available	1	5
Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian	Aug 7, 2024	Autonomous Drivingobject-detection	CodeCode Available	1	5
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery	Jul 11, 2023	Question AnsweringScene Understanding	CodeCode Available	1	5
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence	Jun 22, 2020	Deep LearningScene Understanding	CodeCode Available	1	5
Collaborative Transformers for Grounded Situation Recognition	Mar 30, 2022	Grounded Situation RecognitionImage Classification	CodeCode Available	1	5
Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks	Feb 17, 2023	DeblurringDeep Learning	CodeCode Available	1	5
Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration	Dec 17, 2024	audio-visual event localizationaudio-visual learning	CodeCode Available	1	5
Complementary Random Masking for RGB-Thermal Semantic Segmentation	Mar 30, 2023	Scene UnderstandingSemantic Segmentation	CodeCode Available	1	5
Detecting Human-Object Interaction via Fabricated Compositional Learning	Mar 15, 2021	Affordance RecognitionHuman-Object Interaction Detection	CodeCode Available	1	5
3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation	Feb 7, 2023	6D Pose Estimation6D Pose Estimation using RGB	CodeCode Available	1	5
Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection	Dec 5, 2023	3D Object DetectionDenoising	CodeCode Available	1	5
DIP: Unsupervised Dense In-Context Post-training of Visual Representations	Jun 23, 2025	GPUMeta-Learning	CodeCode Available	1	5
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection	Dec 25, 2023	3D Object Detectionobject-detection	CodeCode Available	1	5
A Survey of World Models for Autonomous Driving	Jan 20, 2025	Anomaly DetectionAutonomous Driving	CodeCode Available	1	5
Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality	Mar 11, 2021	Scene UnderstandingTime Series	CodeCode Available	1	5

Show:10 25 50

← PrevPage 12 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified