Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–500 of 1723 papers

Title	Date	Tasks	Status	Hype
General Geometry-aware Weakly Supervised 3D Object Detection	Jul 18, 2024	3D Object DetectionObject	CodeCode Available	1
Generating Visual Spatial Description via Holistic 3D Scene Understanding	May 19, 2023	Scene UnderstandingText Generation	CodeCode Available	1
Towards Holistic Surgical Scene Understanding	Dec 8, 2022	Action RecognitionAtomic action recognition	CodeCode Available	1
Towards In-context Scene Understanding	Jun 2, 2023	Depth EstimationIn-Context Learning	CodeCode Available	1
Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments	Jul 10, 2022	Instance SegmentationPanoptic Segmentation	CodeCode Available	1
CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks	Mar 28, 2020	3D Medical Imaging SegmentationAction Recognition	CodeCode Available	1
Global Aggregation then Local Distribution in Fully Convolutional Networks	Sep 16, 2019	Instance Segmentationobject-detection	CodeCode Available	1
Towards Scene Understanding for Autonomous Operations on Airport Aprons	Dec 4, 2022	Autonomous DrivingBenchmarking	CodeCode Available	1
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data	Nov 17, 2021	3D Object Detectionobject-detection	CodeCode Available	1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Jan 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
CamContextI2V: Context-aware Controllable Video Generation	Apr 8, 2025	DiversityScene Understanding	CodeCode Available	1
Traffic Scene Parsing through the TSP6K Dataset	Mar 6, 2023	Autonomous DrivingDecoder	CodeCode Available	1
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation	Dec 24, 2021	Depth EstimationDepth Prediction	CodeCode Available	1
Explainable Object-induced Action Decision for Autonomous Vehicles	Mar 20, 2020	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics	Feb 7, 2022	Autonomous DrivingDepth Estimation	CodeCode Available	1
Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding	Jan 28, 2022	Graph AttentionKnowledge Distillation	CodeCode Available	1
TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding	Nov 6, 2023	Boundary DetectionDepth Estimation	CodeCode Available	1
Uncertainty-aware Panoptic Segmentation	Jun 29, 2022	Panoptic SegmentationScene Understanding	CodeCode Available	1
Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene	Aug 11, 2020	Instance SegmentationPoint Cloud Segmentation	CodeCode Available	1
Understanding Bird's-Eye View of Road Semantics using an Onboard Camera	Dec 5, 2020	Autonomous NavigationAutonomous Vehicles	CodeCode Available	1
Holistic 3D Scene Understanding from a Single Image with Implicit Representation	Mar 11, 2021	3D Object Detection3D Shape Reconstruction	CodeCode Available	1
Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation	Apr 22, 2023	Autonomous DrivingKnowledge Distillation	CodeCode Available	1
UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation	Jan 21, 2024	Instance SegmentationScene Understanding	CodeCode Available	1
Unleash the Potential of Image Branch for Cross-modal 3D Object Detection	Jan 22, 2023	3D Object DetectionAutonomous Vehicles	CodeCode Available	1
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios	Aug 30, 2024	Attributegeo-localization	CodeCode Available	1
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud	Jul 28, 2022	Scene Understanding	CodeCode Available	1
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering	Aug 14, 2019	Embodied Question AnsweringQuestion Answering	CodeCode Available	1
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction	Apr 16, 2024	3D Reconstruction3D Shape Reconstruction	CodeCode Available	1
Challenges for Monocular 6D Object Pose Estimation in Robotics	Jul 22, 2023	6D Pose Estimation using RGBObject	—Unverified	0
ArK: Augmented Reality with Knowledge Interactive Emergent Ability	May 1, 2023	AI AgentMixed Reality	—Unverified	0
Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models	Jul 17, 2025	3D Point Cloud ReconstructionPoint cloud reconstruction	—Unverified	0
Adversarial Attacks on Monocular Depth Estimation	Mar 23, 2020	Autonomous DrivingDepth Estimation	—Unverified	0
Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets	Jan 7, 2025	Data Augmentationparameter estimation	—Unverified	0
3D Vision-Language Gaussian Splatting	Oct 10, 2024	3D ReconstructionAutonomous Driving	—Unverified	0
Category-Level and Open-Set Object Pose Estimation for Robotics	Apr 28, 2025	6D Pose Estimation6D Pose Estimation using RGB	—Unverified	0
Evaluation of Multimodal Semantic Segmentation using RGB-D Data	Mar 31, 2021	Scene UnderstandingSemantic Segmentation	—Unverified	0
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL)	Feb 5, 2022	object-detectionObject Detection	—Unverified	0
A Review on Visual-SLAM: Advancements from Geometric Modelling to Learning-based Semantic Scene Understanding	Sep 12, 2022	Scene Understanding	—Unverified	0
GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation	Jul 19, 2024	BEV SegmentationScene Understanding	—Unverified	0
Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy	Oct 9, 2024	ColorizationPoint Cloud Segmentation	—Unverified	0
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users	Mar 28, 2025	Object RecognitionReading Comprehension	—Unverified	0
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection	Dec 11, 2023	BenchmarkingDomain Adaptation	—Unverified	0
CASPNet++: Joint Multi-Agent Motion Prediction	Aug 15, 2023	Autonomous Drivingmotion prediction	—Unverified	0
GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games	May 22, 2024	Code GenerationDecision Making	—Unverified	0
Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks	May 8, 2016	Depth EstimationGeneral Classification	—Unverified	0
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios	Jun 25, 2025	Autonomous DrivingDecision Making	—Unverified	0
Event fields: Capturing light fields at high speed, resolution, and dynamic range	Dec 9, 2024	Depth EstimationScene Understanding	—Unverified	0
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond	Mar 3, 2025	Infrared And Visible Image FusionScene Understanding	—Unverified	0
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding	Jun 30, 2024	Graph GenerationGraph Neural Network	—Unverified	0

Show:10 25 50

← PrevPage 10 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified