Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 1723 papers

Title	Date	Tasks	Status
3D Vision-Language Gaussian Splatting	Oct 10, 2024	3D ReconstructionAutonomous Driving	—Unverified
Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy	Oct 9, 2024	ColorizationPoint Cloud Segmentation	—Unverified
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users	Mar 28, 2025	Object RecognitionReading Comprehension	—Unverified
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection	Dec 11, 2023	BenchmarkingDomain Adaptation	—Unverified
CASPNet++: Joint Multi-Agent Motion Prediction	Aug 15, 2023	Autonomous Drivingmotion prediction	—Unverified
Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks	May 8, 2016	Depth EstimationGeneral Classification	—Unverified
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding	Jun 30, 2024	Graph GenerationGraph Neural Network	—Unverified
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios	Jun 25, 2025	Autonomous DrivingDecision Making	—Unverified
Cascaded Classification Models: Combining Models for Holistic Scene Understanding	Dec 1, 2008	3D Reconstruction3D Scene Reconstruction	—Unverified
ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation	Sep 7, 2020	Autonomous DrivingDomain Adaptation	—Unverified
Car Segmentation and Pose Estimation using 3D Object Models	Dec 21, 2015	3D Pose EstimationImage Segmentation	—Unverified
Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors	Oct 12, 2024	3D Generation3D geometry	—Unverified
A Review and A Robust Framework of Data-Efficient 3D Scene Parsing with Traditional/Learned 3D Descriptors	Dec 3, 2023	Active LearningInstance Segmentation	—Unverified
Enhancing image captioning with depth information using a Transformer-based framework	Jul 24, 2023	Image CaptioningImage Paragraph Captioning	—Unverified
Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning	Mar 15, 2024	Autonomous DrivingHuman-Object Interaction Detection	—Unverified
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving	Sep 11, 2023	Autonomous DrivingDescriptive	—Unverified
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding	Jun 17, 2024	3D Object Detection3D Semantic Segmentation	—Unverified
Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds	Sep 21, 2024	Scene UnderstandingSemantic Segmentation	—Unverified
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps	May 24, 2025	Scene UnderstandingSpatial Reasoning	—Unverified
A Reinforcement Learning Framework for Natural Question Generation using Bi-discriminators	Aug 1, 2018	AttributeNatural Questions	—Unverified
Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection	Jul 17, 2025	Scene Understanding	—Unverified
3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing	Aug 25, 2024	Data AugmentationDiversity	—Unverified
End-to-End Race Driving with Deep Reinforcement Learning	Jul 6, 2018	Deep Reinforcement LearningDomain Adaptation	—Unverified
End-to-end Autonomous Driving using Deep Learning: A Systematic Review	Aug 27, 2023	Autonomous Drivingobject-detection	—Unverified
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving	Sep 4, 2024	Autonomous DrivingDecision Making	—Unverified
Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision	Mar 28, 2025	Optical Flow EstimationPoint Tracking	—Unverified
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind	May 18, 2025	BenchmarkingScene Understanding	—Unverified
A Reinforcement Learning Approach to Target Tracking in a Camera Network	Jul 26, 2018	Q-Learningreinforcement-learning	—Unverified
Empowering Large Language Models with 3D Situation Awareness	Mar 29, 2025	Scene Understanding	—Unverified
Empowering cyberphysical systems of systems with intelligence	Jul 5, 2021	Decision MakingManagement	—Unverified
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?	Apr 23, 2022	Robot ManipulationScene Understanding	—Unverified
EML-NET:An Expandable Multi-Layer NETwork for Saliency Prediction	May 2, 2018	Saliency PredictionScene Understanding	—Unverified
Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery	Mar 29, 2025	Action UnderstandingInstrument Recognition	—Unverified
A Reflectance Based Method For Shadow Detection and Removal	Jul 11, 2018	Detecting ShadowsScene Understanding	—Unverified
A diffusion and clustering-based approach for finding coherent motions and understanding crowd scenes	Feb 16, 2016	ClusteringOptical Flow Estimation	—Unverified
Embracing Diffraction: A Paradigm Shift in Wireless Sensing and Communication	May 2, 2025	Scene Understanding	—Unverified
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments	Jul 14, 2025	Scene UnderstandingSpatial Reasoning	—Unverified
Embodied Visual Active Learning for Semantic Segmentation	Dec 17, 2020	Active LearningDeep Reinforcement Learning	—Unverified
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding	Dec 31, 2024	Robot ManipulationScene Understanding	—Unverified
Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts, Datasets and Metrics	Mar 8, 2023	Autonomous VehiclesScene Understanding	—Unverified
Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects	Jun 1, 2014	3D geometry3D Shape Modeling	—Unverified
Embodied Scene Understanding for Vision Language Models via MetaVQA	Jan 15, 2025	Decision MakingQuestion Answering	—Unverified
Camera-Only Bird's Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles	May 9, 2025	Autonomous NavigationAutonomous Vehicles	—Unverified
Camera Control at the Edge with Language Models for Scene Understanding	May 9, 2025	Language ModelingLanguage Modelling	—Unverified
Addressing the Sim2Real Gap in Robotic 3D Object Classification	Oct 28, 2019	3D Object ClassificationClassification	—Unverified
3D Shape Augmentation with Content-Aware Shape Resizing	May 15, 2024	3D GenerationScene Understanding	—Unverified
Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception	Oct 2, 2023	Autonomous DrivingImage Segmentation	—Unverified
EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting	Mar 14, 2025	Scene UnderstandingSegmentation	—Unverified
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting	Jun 28, 2024	Human-Object Interaction DetectionObject	—Unverified
Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation	Nov 18, 2024	Autonomous DrivingLIDAR Semantic Segmentation	—Unverified

Show:10 25 50

← PrevPage 16 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified