Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 1723 papers

Title	Date	Tasks	Status	Hype	Score
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning	Oct 8, 2020	Natural Language Visual GroundingScene Understanding	CodeCode Available	1	5
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing	Nov 24, 2021	AttributeScene Understanding	CodeCode Available	1	5
Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction	Mar 28, 2025	Autonomous DrivingScene Understanding	CodeCode Available	1	5
MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding	Oct 1, 2020	Deep Learningimage-classification	CodeCode Available	1	5
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud	Jul 28, 2022	Scene Understanding	CodeCode Available	1	5
Dual-Hybrid Attention Network for Specular Highlight Removal	Jul 17, 2024	highlight removalObject Recognition	CodeCode Available	1	5
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding	Apr 9, 2025	Scene UnderstandingSelf-Supervised Learning	CodeCode Available	1	5
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction	May 9, 2024	Contrastive LearningScene Understanding	CodeCode Available	1	5
DPF: Learning Dense Prediction Fields with Weak Supervision	Mar 29, 2023	Intrinsic Image DecompositionPrediction	CodeCode Available	1	5
Mask4D: End-to-End Mask-Based 4D Panoptic Segmentation for LiDAR Sequences	Sep 18, 2023	3D Panoptic Segmentation4D Panoptic Segmentation	CodeCode Available	1	5
MassMIND: Massachusetts Maritime INfrared Dataset	Sep 9, 2022	Instance SegmentationScene Understanding	CodeCode Available	1	5
A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion	Jun 14, 2024	3D ReconstructionAutonomous Driving	CodeCode Available	1	5
AirObject: A Temporally Evolving Graph Embedding for Object Identification	Nov 30, 2021	Graph AttentionGraph Embedding	CodeCode Available	1	5
Dynamic Graph Message Passing Networks	Aug 19, 2019	Image Classificationobject-detection	CodeCode Available	1	5
A Hybrid Sparse-Dense Monocular SLAM System for Autonomous Driving	Aug 17, 2021	Autonomous DrivingDepth Estimation	CodeCode Available	1	5
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection	Jul 13, 2019	3D Object Detection3D Object Detection From Monocular Images	CodeCode Available	1	5
MCTS with Refinement for Proposals Selection Games in Scene Understanding	Jul 7, 2022	Scene Understanding	CodeCode Available	1	5
LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond	Oct 13, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1	5
Constructing Metric-Semantic Maps using Floor Plan Priors for Long-Term Indoor Localization	Mar 20, 2023	3D Object DetectionIndoor Localization	CodeCode Available	1	5
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics	Apr 30, 2025	In-Context LearningObject	CodeCode Available	1	5
3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation	Feb 7, 2023	6D Pose Estimation6D Pose Estimation using RGB	CodeCode Available	1	5
Digging Into Self-Supervised Monocular Depth Estimation	Jun 4, 2018	Camera Pose EstimationDepth Estimation	CodeCode Available	1	5
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding	Mar 16, 2025	Autonomous DrivingRAG	CodeCode Available	1	5
LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation	Jun 14, 2017	GPUScene Understanding	CodeCode Available	1	5
Affordance Transfer Learning for Human-Object Interaction Detection	Apr 7, 2021	Affordance DetectionAffordance Recognition	CodeCode Available	1	5
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments	Dec 14, 2023	3D ReconstructionDecoder	CodeCode Available	1	5
Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise Binarization	Jul 22, 2022	3D Instance Segmentation3D Object Detection	CodeCode Available	1	5
Dynamic Graph Message Passing Networks for Visual Recognition	Sep 20, 2022	image-classificationImage Classification	CodeCode Available	1	5
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations	Dec 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for Autonomous Driving	Dec 7, 2022	Autonomous DrivingInstance Segmentation	CodeCode Available	1	5
MGNet: Monocular Geometric Scene Understanding for Autonomous Driving	Jun 27, 2022	Autonomous DrivingDepth Estimation	CodeCode Available	1	5
Deep learning for radar data exploitation of autonomous vehicle	Mar 15, 2022	Autonomous DrivingDeep Learning	CodeCode Available	1	5
A Survey on Deep Learning Technique for Video Segmentation	Jul 2, 2021	Autonomous DrivingDeep Learning	CodeCode Available	1	5
LED: Light Enhanced Depth Estimation at Night	Sep 12, 2024	Autonomous DrivingDecoder	CodeCode Available	1	5
4D Panoptic LiDAR Segmentation	Feb 24, 2021	4D Panoptic SegmentationBenchmarking	CodeCode Available	1	5
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization	Aug 24, 2021	DiversityGraph Neural Network	CodeCode Available	1	5
Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding	Sep 12, 2022	Common Sense ReasoningScene Classification	CodeCode Available	1	5
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery	Jul 11, 2023	Question AnsweringScene Understanding	CodeCode Available	1	5
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence	Jun 22, 2020	Deep LearningScene Understanding	CodeCode Available	1	5
Collaborative Transformers for Grounded Situation Recognition	Mar 30, 2022	Grounded Situation RecognitionImage Classification	CodeCode Available	1	5
Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks	Feb 17, 2023	DeblurringDeep Learning	CodeCode Available	1	5
Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration	Dec 17, 2024	audio-visual event localizationaudio-visual learning	CodeCode Available	1	5
Complementary Random Masking for RGB-Thermal Semantic Segmentation	Mar 30, 2023	Scene UnderstandingSemantic Segmentation	CodeCode Available	1	5
Detecting Human-Object Interaction via Fabricated Compositional Learning	Mar 15, 2021	Affordance RecognitionHuman-Object Interaction Detection	CodeCode Available	1	5
A Survey of World Models for Autonomous Driving	Jan 20, 2025	Anomaly DetectionAutonomous Driving	CodeCode Available	1	5
Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection	Dec 5, 2023	3D Object DetectionDenoising	CodeCode Available	1	5
DIP: Unsupervised Dense In-Context Post-training of Visual Representations	Jun 23, 2025	GPUMeta-Learning	CodeCode Available	1	5
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection	Dec 25, 2023	3D Object Detectionobject-detection	CodeCode Available	1	5
Distilled Semantics for Comprehensive Scene Understanding from Videos	Mar 31, 2020	Depth EstimationKnowledge Distillation	CodeCode Available	1	5
Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality	Mar 11, 2021	Scene UnderstandingTime Series	CodeCode Available	1	5

Show:10 25 50

← PrevPage 6 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified