Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1551–1600 of 1723 papers

Title	Date	Tasks	Status
Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting	Jul 6, 2021	3D Object DetectionAutonomous Driving	CodeCode Available
Multi-task Planar Reconstruction with Feature Warping Guidance	Nov 25, 2023	3D ReconstructionInstance Segmentation	CodeCode Available
Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images	Nov 4, 2024	Multi-Task LearningScene Understanding	CodeCode Available
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image	Aug 7, 2018	3D Object DetectionMonocular 3D Object Detection	CodeCode Available
Multi-Resolution Multi-Modal Sensor Fusion For Remote Sensing Data With Label Uncertainty	May 2, 2018	Scene UnderstandingSensor Fusion	CodeCode Available
ShelfNet for Fast Semantic Segmentation	Nov 27, 2018	Autonomous DrivingDecoder	CodeCode Available
Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation	Mar 3, 2021	Autonomous DrivingDepth Estimation	CodeCode Available
Deep Depth from Defocus: how can defocus blur improve 3D estimation using dense neural networks?	Sep 5, 2018	3D ReconstructionDepth Estimation	CodeCode Available
BACS: Background Aware Continual Semantic Segmentation	Apr 19, 2024	Autonomous DrivingContinual Learning	CodeCode Available
RESSCAL3D++: Joint Acquisition and Semantic Segmentation of 3D Point Clouds	Oct 3, 2024	Scene UnderstandingSemantic Segmentation	CodeCode Available
ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data	Apr 1, 2019	Scene ParsingScene Understanding	CodeCode Available
Hierarchical Superpixel Segmentation via Structural Information Theory	Jan 13, 2025	graph constructiongraph partitioning	CodeCode Available
Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation	Mar 18, 2024	Common Sense ReasoningEfficient Exploration	CodeCode Available
Veritatem Dies Aperit- Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach	Mar 26, 2019	Autonomous DrivingDepth Completion	CodeCode Available
Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach	Jun 1, 2019	Autonomous DrivingDepth Completion	CodeCode Available
Hierarchical Context Transformer for Multi-level Semantic Scene Understanding	Feb 21, 2025	Contrastive LearningRepresentation Learning	CodeCode Available
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents	Nov 27, 2024	Autonomous NavigationObject Recognition	CodeCode Available
Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery	Jul 22, 2023	Continual LearningScene Understanding	CodeCode Available
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification	Jul 25, 2019	Autonomous VehiclesClassification	CodeCode Available
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Apr 9, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization	Nov 26, 2018	2D Object Detection3D Object Detection	CodeCode Available
Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud	Mar 23, 2019	3D Object DetectionDepth Estimation	CodeCode Available
DC-Scene: Data-Centric Learning for 3D Scene Understanding	May 21, 2025	Autonomous DrivingScene Understanding	CodeCode Available
RIO: 3D Object Instance Re-Localization in Changing Indoor Environments	Aug 16, 2019	ObjectScene Understanding	CodeCode Available
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding	Nov 30, 2024	3D Question Answering (3D-QA)Position	CodeCode Available
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available
Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations	Dec 1, 2019	Scene Understanding	CodeCode Available
General-Purpose Deep Point Cloud Feature Extractor	Mar 12, 2018	3D Object Classification3D Point Cloud Classification	CodeCode Available
Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis	Jun 28, 2023	Scene Understanding	CodeCode Available
APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds	May 15, 2025	Point Cloud SegmentationScene Understanding	CodeCode Available
Gated Driver Attention Predictor	Aug 1, 2023	Driver Attention MonitoringPrediction	CodeCode Available
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio	Oct 1, 2024	Scene UnderstandingSound Source Localization	CodeCode Available
Model-based inexact graph matching on top of CNNs for semantic scene understanding	Jan 18, 2023	Brain SegmentationDeep Learning	CodeCode Available
Gated2Depth: Real-time Dense Lidar from Gated Images	Feb 13, 2019	Scene Understanding	CodeCode Available
GaIA: Graphical Information Gain based Attention Network for Weakly Supervised Point Cloud Semantic Segmentation	Oct 2, 2022	Scene UnderstandingSegmentation	CodeCode Available
MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities	Aug 14, 2020	Representation LearningScene Understanding	CodeCode Available
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild	Jan 8, 2024	Language ModellingLarge Language Model	CodeCode Available
Rotation Invariant Convolutions for 3D Point Clouds Deep Learning	Aug 17, 2019	Deep LearningScene Understanding	CodeCode Available
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Dec 27, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange	Apr 11, 2024	ObjectScene Understanding	CodeCode Available
DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks	Mar 9, 2017	Scene Understanding	CodeCode Available
MGNiceNet: Unified Monocular Geometric Scene Understanding	Nov 18, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation	Nov 16, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available
Collaborative Propagation on Multiple Instance Graphs for 3D Instance Segmentation with Single-point Supervision	Aug 10, 2022	3D Instance SegmentationInstance Segmentation	CodeCode Available
Improving Social Awareness Through DANTE: A Deep Affinity Network for Clustering Conversational Interactants	Jul 24, 2019	ClusteringGraph Clustering	CodeCode Available
DADA: Driver Attention Prediction in Driving Accident Scenarios	Dec 18, 2019	Driver Attention MonitoringPrediction	CodeCode Available
Structure-Aware Residual Pyramid Network for Monocular Depth Estimation	Jul 13, 2019	DecoderDepth Estimation	CodeCode Available
METEOR Guided Divergence for Video Captioning	Dec 20, 2022	Hierarchical Reinforcement LearningScene Understanding	CodeCode Available
MC-PanDA: Mask Confidence for Panoptic Domain Adaptation	Jul 19, 2024	Domain AdaptationPanoptic Segmentation	CodeCode Available
Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs	May 15, 2023	RelationScene Graph Generation	CodeCode Available

Show:10 25 50

← PrevPage 32 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified