Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–950 of 1723 papers

Title	Date	Tasks	Status
Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models	Jan 13, 2025	Scene Understanding	—Unverified
Do Deep Neural Networks Model Nonlinear Compositionality in the Neural Representation of Human-Object Interactions?	Mar 31, 2019	Human-Object Interaction DetectionObject	—Unverified
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos	Mar 11, 2025	Scene Understanding	—Unverified
Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation	Feb 4, 2025	Contrastive LearningDecoder	—Unverified
MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors	Sep 21, 2024	2D Semantic Segmentation3D Semantic Segmentation	—Unverified
Movies2Scenes: Using Movie Metadata to Learn Scene Representation	Feb 22, 2022	Contrastive LearningScene Understanding	—Unverified
Moving Beyond Navigation with Active Neural SLAM	Jan 17, 2022	Domain Generalizationmotion prediction	—Unverified
Moving Object Proposals with Deep Learned Optical Flow for Video Object Segmentation	Feb 14, 2024	DecoderObject	—Unverified
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation	Jun 17, 2023	Decision MakingInstruction Following	—Unverified
UniPLV: Towards Label-Efficient Open-World 3D Scene Understanding by Regional Visual Language Supervision	Dec 24, 2024	Scene UnderstandingSemantic Segmentation	—Unverified
Distraction-Aware Shadow Detection	Jun 1, 2019	Scene UnderstandingShadow Detection	—Unverified
MTANet: Multitask-Aware Network With Hierarchical Multimodal Fusion for RGB-T Urban Scene Understanding	Apr 5, 2022	Autonomous VehiclesScene Understanding	—Unverified
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features	Jun 17, 2024	3D geometry3D Semantic Occupancy Prediction	—Unverified
Distillation of Human-Object Interaction Contexts for Action Recognition	Dec 17, 2021	Action RecognitionGraph Attention	—Unverified
Discriminative Multi-Modal Feature Fusion for RGBD Indoor Scene Recognition	Jun 1, 2016	Image SegmentationObject Recognition	—Unverified
Discovery of Shared Semantic Spaces for Multi-Scene Video Query and Summarization	Jul 27, 2015	Scene UnderstandingSemantic Similarity	—Unverified
Disaster Anomaly Detector via Deeper FCDDs for Explainable Initial Responses	Jun 5, 2023	Anomaly DetectionDisaster Response	—Unverified
DirectShape: Direct Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation	Apr 22, 2019	3D Object DetectionAutonomous Driving	—Unverified
Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes	Jun 4, 2023	Common Sense ReasoningQuestion Answering	—Unverified
UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation	Jan 10, 2025	DecoderGraph Generation	—Unverified
Multimodal 3D Object Detection on Unseen Domains	Apr 17, 2024	3D Object DetectionAutonomous Driving	—Unverified
Multimodal 3D Reasoning Segmentation with Complex Scenes	Nov 21, 2024	Reasoning SegmentationScene Understanding	—Unverified
Multi-Modal Attention-based Fusion Model for Semantic Segmentation of RGB-Depth Images	Dec 25, 2019	Scene UnderstandingSegmentation	—Unverified
Direction-Aware Semi-Dense SLAM	Sep 18, 2017	Scene UnderstandingSegmentation	—Unverified
DINeMo: Learning Neural Mesh Models with no 3D Annotations	Mar 26, 2025	3D Pose Estimation6D Pose Estimation	—Unverified
Digital Divides in Scene Recognition: Uncovering Socioeconomic Biases in Deep Learning Systems	Jan 23, 2024	Scene ClassificationScene Recognition	—Unverified
Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends	Apr 21, 2025	Adversarial RobustnessDecision Making	—Unverified
UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration	Oct 30, 2024	Point Cloud RegistrationRepresentation Learning	—Unverified
Multi-Model Learning for Real-Time Automotive Semantic Foggy Scene Understanding via Domain Adaptation	Dec 9, 2020	DecoderDomain Adaptation	—Unverified
Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models	May 2, 2015	ClassificationClustering	—Unverified
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data	Mar 22, 2024	DenoisingScene Understanding	—Unverified
A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding	May 4, 2016	Scene Understanding	—Unverified
Multiple-gaze geometry: Inferring novel 3D locations from gazes observed in monocular video	Sep 1, 2018	Scene UnderstandingSmall Data Image Classification	—Unverified
ACDC: The Adverse Conditions Dataset with Correspondences for Robust Semantic Driving Scene Perception	Apr 27, 2021	Instance Segmentationobject-detection	—Unverified
Diffusion Models in 3D Vision: A Survey	Oct 7, 2024	Autonomous DrivingComputational Efficiency	—Unverified
Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer	Sep 23, 2024	Scene UnderstandingSemantic Segmentation	—Unverified
Multi-task GANs for Semantic Segmentation and Depth Completion with Cycle Consistency	Nov 29, 2020	Autonomous DrivingDepth Completion	—Unverified
Unsupervised 3D Structure Inference from Category-Specific Image Collections	Jan 1, 2024	Graph MatchingObject	—Unverified
DiffSDFSim: Differentiable Rigid-Body Dynamics With Implicit Shapes	Nov 30, 2021	FrictionObject	—Unverified
Multi-Task Learning for Automotive Foggy Scene Understanding via Domain Adaptation to an Illumination-Invariant Representation	Sep 17, 2019	DecoderDomain Adaptation	—Unverified
Multi-Task Learning for Visual Scene Understanding	Mar 28, 2022	Multi-Task LearningScene Understanding	—Unverified
Multi-task learning from fixed-wing UAV images for 2D/3D city modeling	Aug 25, 2021	Change DetectionDepth Estimation	—Unverified
Multi-Task Learning with Multi-Task Optimization	Mar 24, 2024	Automated Theorem Provingimage-classification	—Unverified
Unsupervised Discovery and Composition of Object Light Fields	May 8, 2022	Novel View SynthesisObject	—Unverified
Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis	Dec 14, 2023	Image CaptioningScene Understanding	—Unverified
Multiview Based 3D Scene Understanding On Partial Point Sets	Nov 30, 2018	3D Part Segmentation3D Shape Recognition	—Unverified
Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras	Mar 26, 2017	Scene UnderstandingSegmentation	—Unverified
Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset	Dec 18, 2024	Pedestrian DetectionScene Understanding	—Unverified
Multi-view PointNet for 3D Scene Understanding	Sep 30, 2019	3D Instance Segmentation3D Semantic Segmentation	—Unverified
Diagnostics in Semantic Segmentation	Sep 27, 2018	Image SegmentationScene Understanding	—Unverified

Show:10 25 50

← PrevPage 19 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified