Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–650 of 1723 papers

Title	Date	Tasks	Status
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models	Feb 19, 2024	Autonomous DrivingScene Understanding	—Unverified
DriveGuard: Robustification of Automated Driving Systems with Deep Spatio-Temporal Convolutional Autoencoder	Nov 5, 2021	Autonomous VehiclesImage Segmentation	—Unverified
Boundary Seeking GANs	Jan 1, 2018	Scene UnderstandingText Generation	—Unverified
Joint Optical Flow and Temporally Consistent Semantic Segmentation	Jul 26, 2016	Motion EstimationOptical Flow Estimation	—Unverified
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving	Aug 29, 2024	Autonomous DrivingDenoising	—Unverified
DreamAnywhere: Object-Centric Panoramic 3D Scene Generation	Jun 25, 2025	Novel View SynthesisObject	—Unverified
Bottom-up Instance Segmentation using Deep Higher-Order CRFs	Sep 8, 2016	Instance SegmentationObject	—Unverified
3D Scene Understanding at Urban Intersection using Stereo Vision and Digital Map	Dec 10, 2021	Autonomous VehiclesNavigate	—Unverified
Joint prototype and coefficient prediction for 3D instance segmentation	Jul 9, 2024	3D Instance SegmentationInstance Segmentation	—Unverified
DORSal: Diffusion for Object-centric Representations of Scenes et al	Jun 13, 2023	Neural RenderingObject	—Unverified
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation	May 28, 2025	Autonomous NavigationRAG	—Unverified
Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding	Dec 1, 2021	DisentanglementDomain Adaptation	—Unverified
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs	Jun 5, 2025	cross-modal alignmentDense Captioning	—Unverified
Does CLIP perceive art the same way we do?	May 8, 2025	Image GenerationScene Understanding	—Unverified
Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation	Mar 25, 2023	Domain AdaptationERP	—Unverified
Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions	Sep 11, 2018	Question AnsweringScene Understanding	—Unverified
Do Deep Neural Networks Model Nonlinear Compositionality in the Neural Representation of Human-Object Interactions?	Mar 31, 2019	Human-Object Interaction DetectionObject	—Unverified
Answerability Fields: Answerable Location Estimation via Diffusion Models	Jul 26, 2024	Question AnsweringScene Understanding	—Unverified
Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World	Jan 1, 2013	Language AcquisitionQuestion Answering	—Unverified
Joint Modeling of Visual Objects and Relations for Scene Graph Generation	Dec 1, 2021	Graph EmbeddingGraph Generation	—Unverified
Joint Semantic and Motion Segmentation for dynamic scenes using Deep Convolutional Networks	Apr 18, 2017	Motion SegmentationOptical Flow Estimation	—Unverified
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos	Mar 11, 2025	Scene Understanding	—Unverified
Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation	May 11, 2025	Autonomous DrivingDomain Adaptation	—Unverified
Distraction-Aware Shadow Detection	Jun 1, 2019	Scene UnderstandingShadow Detection	—Unverified
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features	Jun 17, 2024	3D geometry3D Semantic Occupancy Prediction	—Unverified
An Intelligent Safety System for Human-Centered Semi-Autonomous Vehicles	Dec 10, 2018	Autonomous DrivingAutonomous Vehicles	—Unverified
Distillation of Human-Object Interaction Contexts for Action Recognition	Dec 17, 2021	Action RecognitionGraph Attention	—Unverified
3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare	Jun 1, 2018	3D Object ReconstructionAutonomous Driving	—Unverified
BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight	Jul 11, 2024	Autonomous DrivingBEV Segmentation	—Unverified
3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning	Feb 13, 2025	Code GenerationScene Understanding	—Unverified
Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows	Mar 20, 2022	Human-Object Interaction DetectionObject	—Unverified
Discriminative Multi-Modal Feature Fusion for RGBD Indoor Scene Recognition	Jun 1, 2016	Image SegmentationObject Recognition	—Unverified
Discovery of Shared Semantic Spaces for Multi-Scene Video Query and Summarization	Jul 27, 2015	Scene UnderstandingSemantic Similarity	—Unverified
An Exemplar-based CRF for Multi-instance Object Segmentation	Jun 1, 2014	Instance SegmentationObject	—Unverified
Disaster Anomaly Detector via Deeper FCDDs for Explainable Initial Responses	Jun 5, 2023	Anomaly DetectionDisaster Response	—Unverified
BlindSpotNet: Seeing Where We Cannot See	Jul 8, 2022	Depth EstimationMonocular Depth Estimation	—Unverified
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction	Mar 31, 2024	Autonomous DrivingPrediction	—Unverified
iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability	Jun 25, 2021	Bias DetectionQuestion Answering	—Unverified
DirectShape: Direct Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation	Apr 22, 2019	3D Object DetectionAutonomous Driving	—Unverified
Direction-Aware Semi-Dense SLAM	Sep 18, 2017	Scene UnderstandingSegmentation	—Unverified
Blending Learning and Inference in Structured Prediction	Oct 8, 2012	PredictionScene Understanding	—Unverified
DINeMo: Learning Neural Mesh Models with no 3D Annotations	Mar 26, 2025	3D Pose Estimation6D Pose Estimation	—Unverified
A New Ratio Image Based CNN Algorithm For SAR Despeckling	Jun 10, 2019	General ClassificationScene Understanding	—Unverified
J-MOD^2: Joint Monocular Obstacle Detection and Depth Estimation	Sep 25, 2017	Depth EstimationScene Understanding	—Unverified
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction	Sep 17, 2021	Representation LearningSaliency Prediction	—Unverified
Digital Divides in Scene Recognition: Uncovering Socioeconomic Biases in Deep Learning Systems	Jan 23, 2024	Scene ClassificationScene Recognition	—Unverified
Active Scene Understanding via Online Semantic Reconstruction	Jun 18, 2019	Scene ParsingScene Understanding	—Unverified
3D Question Answering for City Scene Understanding	Jul 24, 2024	Autonomous DrivingQuestion Answering	—Unverified
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data	Mar 22, 2024	DenoisingScene Understanding	—Unverified
Diffusion Models in 3D Vision: A Survey	Oct 7, 2024	Autonomous DrivingComputational Efficiency	—Unverified

Show:10 25 50

← PrevPage 13 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified