Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1201–1250 of 1723 papers

Title	Date	Tasks	Status
Reconstructing Vechicles from a Single Image: Shape Priors for Road Scene Understanding	Sep 29, 2016	Autonomous Drivingroad scene understanding	—Unverified
Recyclable Semi-supervised Method Based on Multi-model Ensemble for Video Scene Parsing	Jun 5, 2023	Scene ParsingScene Understanding	—Unverified
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection	Dec 11, 2023	BenchmarkingDomain Adaptation	—Unverified
Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications	Nov 18, 2024	Scene SegmentationScene Understanding	—Unverified
Referring Self-supervised Learning on 3D Point Cloud	Sep 29, 2021	Scene UnderstandingSelf-Supervised Learning	—Unverified
RefineCap: Concept-Aware Refinement for Image Captioning	Sep 8, 2021	DecoderDescriptive	—Unverified
CASPNet++: Joint Multi-Agent Motion Prediction	Aug 15, 2023	Autonomous Drivingmotion prediction	—Unverified
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios	Jun 25, 2025	Autonomous DrivingDecision Making	—Unverified
Cascaded Classification Models: Combining Models for Holistic Scene Understanding	Dec 1, 2008	3D Reconstruction3D Scene Reconstruction	—Unverified
Relationship Proposal Networks	Jul 1, 2017	AllScene Understanding	—Unverified
Relevance-driven Decision Making for Safer and More Efficient Human Robot Collaboration	Sep 21, 2024	Collision AvoidanceDecision Making	—Unverified
Relevance for Human Robot Collaboration	Sep 12, 2024	Dimensionality ReductionScene Understanding	—Unverified
Car Segmentation and Pose Estimation using 3D Object Models	Dec 21, 2015	3D Pose EstimationImage Segmentation	—Unverified
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving	Sep 11, 2023	Autonomous DrivingDescriptive	—Unverified
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps	May 24, 2025	Scene UnderstandingSpatial Reasoning	—Unverified
REMIPS: Physically Consistent 3D Reconstruction of Multiple Interacting People under Weak Supervision	Dec 1, 2021	3D Human Reconstruction3D Reconstruction	—Unverified
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving	Sep 4, 2024	Autonomous DrivingDecision Making	—Unverified
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind	May 18, 2025	BenchmarkingScene Understanding	—Unverified
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?	Apr 23, 2022	Robot ManipulationScene Understanding	—Unverified
Residual 3D Scene Flow Learning with Context-Aware Feature Extraction	Sep 10, 2021	Autonomous DrivingScene Flow Estimation	—Unverified
Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders	Oct 7, 2024	Multiview DetectionScene Understanding	—Unverified
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding	May 18, 2023	Contrastive LearningObject	—Unverified
3D Shape Augmentation with Content-Aware Shape Resizing	May 15, 2024	3D GenerationScene Understanding	—Unverified
BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-task Dense Predictions	Dec 21, 2023	DecoderMulti-Task Learning	—Unverified
Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets	Jul 29, 2024	DecoderScene Understanding	—Unverified
Rethinking Semantic Segmentation Evaluation for Explainability and Model Selection	Jan 21, 2021	Autonomous NavigationModel Selection	—Unverified
VrR-VG: Refocusing Visually-Relevant Relationships	Feb 1, 2019	Image CaptioningQuestion Answering	—Unverified
Review on 6D Object Pose Estimation with the focus on Indoor Scene Understanding	Dec 4, 2022	6D Pose Estimation using RGBObject	—Unverified
Review on Panoramic Imaging and Its Applications in Scene Understanding	May 11, 2022	Autonomous DrivingDepth Estimation	—Unverified
3D Scene Understanding at Urban Intersection using Stereo Vision and Digital Map	Dec 10, 2021	Autonomous VehiclesNavigate	—Unverified
Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery	Mar 29, 2025	Action UnderstandingInstrument Recognition	—Unverified
Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts, Datasets and Metrics	Mar 8, 2023	Autonomous VehiclesScene Understanding	—Unverified
Camera-Only Bird's Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles	May 9, 2025	Autonomous NavigationAutonomous Vehicles	—Unverified
Camera Control at the Edge with Language Models for Scene Understanding	May 9, 2025	Language ModelingLanguage Modelling	—Unverified
Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks	May 27, 2025	3D Scene ReconstructionDiagnostic	—Unverified
Visual Affordance and Function Understanding: A Survey	Jul 18, 2018	Affordance DetectionScene Understanding	—Unverified
Road Rage Reasoning with Vision-language Models (VLMs): Task Definition and Evaluation Dataset	Mar 14, 2025	Scene Understanding	—Unverified
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets	May 21, 2025	Dataset GenerationDescriptive	—Unverified
Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation	Nov 18, 2024	Autonomous DrivingLIDAR Semantic Segmentation	—Unverified
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics	Nov 25, 2024	Robot ManipulationScene Understanding	—Unverified
Robust 3D Scene Segmentation through Hierarchical and Learnable Part-Fusion	Nov 16, 2021	3D Semantic SegmentationAutonomous Driving	—Unverified
Robust Category-Level 3D Pose Estimation from Synthetic Data	May 25, 2023	3D Pose Estimation3D Reconstruction	—Unverified
Robust deep learning-based semantic organ segmentation in hyperspectral images	Nov 9, 2021	Deep LearningImage Segmentation	—Unverified
Robust Multi-Modal Image Stitching for Improved Scene Understanding	Dec 28, 2023	Image StitchingScene Understanding	—Unverified
CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting	Apr 16, 2025	3DGS3D Instance Segmentation	—Unverified
Robust Visual Localization via Semantic-Guided Multi-Scale Transformer	Jun 10, 2025	regressionScene Understanding	—Unverified
Roominoes: Generating Novel 3D Floor Plans From Existing 3D Rooms	Dec 10, 2021	3D ReconstructionAutonomous Navigation	—Unverified
CaDIS: Cataract Dataset for Image Segmentation	Jun 27, 2019	2D Semantic Segmentation task 1 (8 classes)2D Semantic Segmentation task 2 (17 classes)	—Unverified
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness	Apr 2, 2025	Scene Understanding	—Unverified
3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare	Jun 1, 2018	3D Object ReconstructionAutonomous Driving	—Unverified

Show:10 25 50

← PrevPage 25 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified