Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1451–1500 of 1723 papers

Title	Date	Tasks	Status
Robust Visual Localization via Semantic-Guided Multi-Scale Transformer	Jun 10, 2025	regressionScene Understanding	—Unverified
Roominoes: Generating Novel 3D Floor Plans From Existing 3D Rooms	Dec 10, 2021	3D ReconstructionAutonomous Navigation	—Unverified
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness	Apr 2, 2025	Scene Understanding	—Unverified
RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model	Apr 7, 2025	Image Captioningimage-classification	—Unverified
S^3M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving	Jan 21, 2024	Autonomous DrivingScene Understanding	—Unverified
S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-shot Segmentation	Nov 4, 2020	Autonomous DrivingEdge-computing	—Unverified
S4C: Self-Supervised Semantic Scene Completion with Neural Fields	Oct 11, 2023	Image SegmentationNavigate	—Unverified
Safety Assessment for Autonomous Systems' Perception Capabilities	Aug 17, 2022	Decision MakingScene Understanding	—Unverified
SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data	May 18, 2021	object-detectionObject Detection	—Unverified
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes	Jun 2, 2025	Scene Understanding	—Unverified
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation	May 30, 2024	Instruction Followingparameter-efficient fine-tuning	—Unverified
SAM-Guided Masked Token Prediction for 3D Scene Understanding	Oct 16, 2024	3D Object DetectionKnowledge Distillation	—Unverified
SAMPLE-HD: Simultaneous Action and Motion Planning Learning Environment	Jun 1, 2022	Motion PlanningQuestion Answering	—Unverified
Scale-aware Neural Network for Semantic Segmentation of Multi-resolution Remote Sensing Images	Mar 14, 2021	Scene UnderstandingSegmentation	—Unverified
SANPO: A Scene Understanding, Accessibility and Human Navigation Dataset	Sep 21, 2023	Autonomous VehiclesDepth Estimation	—Unverified
Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans	Jun 6, 2022	Scene Understanding	—Unverified
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning	Feb 19, 2025	Autonomous DrivingBench2Drive	—Unverified
Scenarios: A New Representation for Complex Scene Understanding	Feb 16, 2018	Image RetrievalObject Recognition	—Unverified
Scene-aware Human Pose Generation using Transformer	Aug 4, 2023	Knowledge DistillationScene Understanding	—Unverified
Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation	Jul 5, 2022	Dialogue GenerationDialogue Understanding	—Unverified
SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis	Jun 12, 2025	Novel View SynthesisScene Understanding	—Unverified
Counterfactual Critic Multi-Agent Training for Scene Graph Generation	Dec 6, 2018	counterfactualGraph Generation	—Unverified
Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models	Apr 6, 2025	Computational EfficiencyGeneral Knowledge	CodeCode Available
Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video	May 27, 2019	Inductive BiasModel Predictive Control	CodeCode Available
PENet: A Joint Panoptic Edge Detection Network	Mar 15, 2023	Edge DetectionMulti-Task Learning	CodeCode Available
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding	Oct 19, 2024	Autonomous Drivingobject-detection	CodeCode Available
Parsing Natural Scenes and Natural Language with Recursive Neural Networks	Jun 1, 2011	General ClassificationScene Classification	CodeCode Available
Parsing Geometry Using Structure-Aware Shape Templates	Aug 3, 2018	ObjectObject Recognition	CodeCode Available
Parallel Neural Computing for Scene Understanding from LiDAR Perception in Autonomous Racing	Dec 24, 2024	Autonomous DrivingAutonomous Racing	CodeCode Available
Sequential Cross Attention Based Multi-task Learning	Sep 6, 2022	Multi-Task LearningScene Understanding	CodeCode Available
PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video	Jan 1, 2024	3D Panoptic Segmentation3D Reconstruction	CodeCode Available
Panoramic Depth Estimation via Supervised and Unsupervised Learning in Indoor Scenes	Aug 18, 2021	Camera CalibrationDepth Estimation	CodeCode Available
P2AT: Pyramid Pooling Axial Transformer for Real-time Semantic Segmentation	Oct 23, 2023	Autonomous DrivingDecoder	CodeCode Available
SGDraw: Scene Graph Drawing Interface Using Object-Oriented Representation	Nov 30, 2022	Graph GenerationImage Generation	CodeCode Available
Pose-aware Multi-level Feature Network for Human Object Interaction Detection	Sep 18, 2019	Human-Object Interaction DetectionObject	CodeCode Available
OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies	Dec 31, 2024	3DGS3D Semantic Segmentation	CodeCode Available
Dilated Residual Networks	May 28, 2017	ClassificationGeneral Classification	CodeCode Available
Incorporating Luminance, Depth and Color Information by a Fusion-based Network for Semantic Segmentation	Sep 24, 2018	Autonomous DrivingReal-Time Semantic Segmentation	CodeCode Available
OVeNet: Offset Vector Network for Semantic Segmentation	Mar 25, 2023	Optical Character Recognition (OCR)Scene Understanding	CodeCode Available
Unsupervised Domain Adaptation using Generative Adversarial Networks for Semantic Segmentation of Aerial Images	May 8, 2019	Domain AdaptationManagement	CodeCode Available
Predicting Deeper into the Future of Semantic Segmentation	Mar 22, 2017	AttributeAutonomous Driving	CodeCode Available
Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data	Jul 14, 2024	3D Object Detection3D Semantic Segmentation	CodeCode Available
Shape Anchor Guided Holistic Indoor Scene Understanding	Sep 20, 2023	3D Object Detectionobject-detection	CodeCode Available
Unsupervised Foggy Scene Understanding via Self Spatial-Temporal Label Diffusion	Jun 10, 2022	Autonomous DrivingDomain Adaptation	CodeCode Available
Improving Object Detection for Time-Lapse Imagery Using Temporal Features in Wildlife Monitoring	Dec 20, 2024	Objectobject-detection	CodeCode Available
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding	Jul 10, 2025	Scene UnderstandingSpatial Reasoning	CodeCode Available
OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation	Mar 18, 2024	3D Reconstruction3D Scene Reconstruction	CodeCode Available
Impact of Ground Truth Annotation Quality on Performance of Semantic Image Segmentation of Traffic Conditions	Dec 30, 2018	Autonomous DrivingImage Segmentation	CodeCode Available
On the Structures of Representation for the Robustness of Semantic Segmentation to Input Corruption	Sep 2, 2020	Scene UnderstandingSegmentation	CodeCode Available
Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation	Mar 19, 2024	Domain AdaptationObject	CodeCode Available

Show:10 25 50

← PrevPage 30 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified