Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1551–1575 of 1723 papers

Title	Date	Tasks	Status
Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting	Jul 6, 2021	3D Object DetectionAutonomous Driving	CodeCode Available
Multi-task Planar Reconstruction with Feature Warping Guidance	Nov 25, 2023	3D ReconstructionInstance Segmentation	CodeCode Available
Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images	Nov 4, 2024	Multi-Task LearningScene Understanding	CodeCode Available
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image	Aug 7, 2018	3D Object DetectionMonocular 3D Object Detection	CodeCode Available
Multi-Resolution Multi-Modal Sensor Fusion For Remote Sensing Data With Label Uncertainty	May 2, 2018	Scene UnderstandingSensor Fusion	CodeCode Available
ShelfNet for Fast Semantic Segmentation	Nov 27, 2018	Autonomous DrivingDecoder	CodeCode Available
Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation	Mar 3, 2021	Autonomous DrivingDepth Estimation	CodeCode Available
Deep Depth from Defocus: how can defocus blur improve 3D estimation using dense neural networks?	Sep 5, 2018	3D ReconstructionDepth Estimation	CodeCode Available
BACS: Background Aware Continual Semantic Segmentation	Apr 19, 2024	Autonomous DrivingContinual Learning	CodeCode Available
RESSCAL3D++: Joint Acquisition and Semantic Segmentation of 3D Point Clouds	Oct 3, 2024	Scene UnderstandingSemantic Segmentation	CodeCode Available
ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data	Apr 1, 2019	Scene ParsingScene Understanding	CodeCode Available
Hierarchical Superpixel Segmentation via Structural Information Theory	Jan 13, 2025	graph constructiongraph partitioning	CodeCode Available
Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation	Mar 18, 2024	Common Sense ReasoningEfficient Exploration	CodeCode Available
Veritatem Dies Aperit- Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach	Mar 26, 2019	Autonomous DrivingDepth Completion	CodeCode Available
Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach	Jun 1, 2019	Autonomous DrivingDepth Completion	CodeCode Available
Hierarchical Context Transformer for Multi-level Semantic Scene Understanding	Feb 21, 2025	Contrastive LearningRepresentation Learning	CodeCode Available
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents	Nov 27, 2024	Autonomous NavigationObject Recognition	CodeCode Available
Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery	Jul 22, 2023	Continual LearningScene Understanding	CodeCode Available
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification	Jul 25, 2019	Autonomous VehiclesClassification	CodeCode Available
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Apr 9, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization	Nov 26, 2018	2D Object Detection3D Object Detection	CodeCode Available
Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud	Mar 23, 2019	3D Object DetectionDepth Estimation	CodeCode Available
DC-Scene: Data-Centric Learning for 3D Scene Understanding	May 21, 2025	Autonomous DrivingScene Understanding	CodeCode Available
RIO: 3D Object Instance Re-Localization in Changing Indoor Environments	Aug 16, 2019	ObjectScene Understanding	CodeCode Available
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding	Nov 30, 2024	3D Question Answering (3D-QA)Position	CodeCode Available

Show:10 25 50

← PrevPage 63 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified