Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 476–500 of 1723 papers

Title	Date	Tasks	Status	Hype
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios	Aug 30, 2024	Attributegeo-localization	CodeCode Available	1
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud	Jul 28, 2022	Scene Understanding	CodeCode Available	1
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering	Aug 14, 2019	Embodied Question AnsweringQuestion Answering	CodeCode Available	1
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction	Apr 16, 2024	3D Reconstruction3D Shape Reconstruction	CodeCode Available	1
Challenges for Monocular 6D Object Pose Estimation in Robotics	Jul 22, 2023	6D Pose Estimation using RGBObject	—Unverified	0
ArK: Augmented Reality with Knowledge Interactive Emergent Ability	May 1, 2023	AI AgentMixed Reality	—Unverified	0
Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models	Jul 17, 2025	3D Point Cloud ReconstructionPoint cloud reconstruction	—Unverified	0
Adversarial Attacks on Monocular Depth Estimation	Mar 23, 2020	Autonomous DrivingDepth Estimation	—Unverified	0
Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets	Jan 7, 2025	Data Augmentationparameter estimation	—Unverified	0
3D Vision-Language Gaussian Splatting	Oct 10, 2024	3D ReconstructionAutonomous Driving	—Unverified	0
Category-Level and Open-Set Object Pose Estimation for Robotics	Apr 28, 2025	6D Pose Estimation6D Pose Estimation using RGB	—Unverified	0
Evaluation of Multimodal Semantic Segmentation using RGB-D Data	Mar 31, 2021	Scene UnderstandingSemantic Segmentation	—Unverified	0
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL)	Feb 5, 2022	object-detectionObject Detection	—Unverified	0
A Review on Visual-SLAM: Advancements from Geometric Modelling to Learning-based Semantic Scene Understanding	Sep 12, 2022	Scene Understanding	—Unverified	0
GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation	Jul 19, 2024	BEV SegmentationScene Understanding	—Unverified	0
Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy	Oct 9, 2024	ColorizationPoint Cloud Segmentation	—Unverified	0
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users	Mar 28, 2025	Object RecognitionReading Comprehension	—Unverified	0
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection	Dec 11, 2023	BenchmarkingDomain Adaptation	—Unverified	0
CASPNet++: Joint Multi-Agent Motion Prediction	Aug 15, 2023	Autonomous Drivingmotion prediction	—Unverified	0
GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games	May 22, 2024	Code GenerationDecision Making	—Unverified	0
Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks	May 8, 2016	Depth EstimationGeneral Classification	—Unverified	0
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios	Jun 25, 2025	Autonomous DrivingDecision Making	—Unverified	0
Event fields: Capturing light fields at high speed, resolution, and dynamic range	Dec 9, 2024	Depth EstimationScene Understanding	—Unverified	0
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond	Mar 3, 2025	Infrared And Visible Image FusionScene Understanding	—Unverified	0
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding	Jun 30, 2024	Graph GenerationGraph Neural Network	—Unverified	0

Show:10 25 50

← PrevPage 20 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified