Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–625 of 1723 papers

Title	Date	Tasks	Status	Score
Object Attribute Matters in Visual Question Answering	Dec 20, 2023	AttributeGraph Neural Network	CodeCode Available	5
One model to use them all: Training a segmentation model with complementary datasets	Feb 29, 2024	AllAnatomy	CodeCode Available	5
Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting	Jul 6, 2021	3D Object DetectionAutonomous Driving	CodeCode Available	5
DADA: Driver Attention Prediction in Driving Accident Scenarios	Dec 18, 2019	Driver Attention MonitoringPrediction	CodeCode Available	5
Multi-task Planar Reconstruction with Feature Warping Guidance	Nov 25, 2023	3D ReconstructionInstance Segmentation	CodeCode Available	5
Neural Radiance Field Codebooks	Jan 10, 2023	ObjectRepresentation Learning	CodeCode Available	5
Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs	May 15, 2023	RelationScene Graph Generation	CodeCode Available	5
CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation	Jan 16, 2025	Novel View SynthesisScene Understanding	CodeCode Available	5
Multi-Resolution Multi-Modal Sensor Fusion For Remote Sensing Data With Label Uncertainty	May 2, 2018	Scene UnderstandingSensor Fusion	CodeCode Available	5
Auto-Embedding Generative Adversarial Networks for High Resolution Image Synthesis	Mar 27, 2019	Generative Adversarial NetworkImage Generation	CodeCode Available	5
ShelfNet for Fast Semantic Segmentation	Nov 27, 2018	Autonomous DrivingDecoder	CodeCode Available	5
Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images	Nov 4, 2024	Multi-Task LearningScene Understanding	CodeCode Available	5
Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera	Jan 9, 2019	3D Reconstruction3D Scene Reconstruction	CodeCode Available	5
Panoramic Depth Estimation via Supervised and Unsupervised Learning in Indoor Scenes	Aug 18, 2021	Camera CalibrationDepth Estimation	CodeCode Available	5
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification	Jul 25, 2019	Autonomous VehiclesClassification	CodeCode Available	5
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation	Oct 31, 2018	3D Object DetectionCamera Pose Estimation	CodeCode Available	5
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Apr 9, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available	5
Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans	Dec 1, 2021	4D Panoptic SegmentationAutonomous Navigation	CodeCode Available	5
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents	Nov 27, 2024	Autonomous NavigationObject Recognition	CodeCode Available	5
Continual Learning of Unsupervised Monocular Depth from Videos	Nov 4, 2023	Autonomous DrivingContinual Learning	CodeCode Available	5
Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud	Mar 23, 2019	3D Object DetectionDepth Estimation	CodeCode Available	5
Model-based inexact graph matching on top of CNNs for semantic scene understanding	Jan 18, 2023	Brain SegmentationDeep Learning	CodeCode Available	5
Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations	Dec 1, 2019	Scene Understanding	CodeCode Available	5
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization	Nov 26, 2018	2D Object Detection3D Object Detection	CodeCode Available	5
MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities	Aug 14, 2020	Representation LearningScene Understanding	CodeCode Available	5

Show:10 25 50

← PrevPage 25 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified