Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–475 of 1723 papers

Title	Date	Tasks	Status	Hype	Score
DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric Voxelization	Apr 30, 2023	DecoderNeRF	CodeCode Available	1	5
ODAM: Object Detection, Association, and Mapping using Posed RGB Video	Aug 23, 2021	3D Object DetectionGraph Neural Network	CodeCode Available	1	5
Object Pose Estimation via the Aggregation of Diffusion Features	Mar 27, 2024	Pose EstimationScene Understanding	CodeCode Available	1	5
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints	Apr 2, 2020	3D ReconstructionDepth Estimation	CodeCode Available	1	5
PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image	Oct 21, 2021	DecoderDepth Estimation	CodeCode Available	1	5
PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation	Apr 3, 2020	3D Instance SegmentationClustering	CodeCode Available	1	5
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation	Sep 20, 2021	DecoderPrediction	CodeCode Available	1	5
Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding	Apr 16, 2020	Human Part SegmentationPanoptic Segmentation	CodeCode Available	1	5
Egocentric Scene Understanding via Multimodal Spatial Rectifier	Jul 14, 2022	Scene UnderstandingSurface Normal Estimation	CodeCode Available	1	5
One-Shot Object Affordance Detection in the Wild	Aug 8, 2021	Action RecognitionAffordance Detection	CodeCode Available	1	5
CamContextI2V: Context-aware Controllable Video Generation	Apr 8, 2025	DiversityScene Understanding	CodeCode Available	1	5
Online 3D reconstruction and dense tracking in endoscopic videos	Sep 9, 2024	3D Reconstruction3D Scene Reconstruction	CodeCode Available	1	5
Human-centric Scene Understanding for 3D Large-scale Scenarios	Jul 26, 2023	Action RecognitionScene Understanding	CodeCode Available	1	5
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge	Nov 21, 2023	Large Language ModelMultimodal Deep Learning	CodeCode Available	1	5
Estimating Generic 3D Room Structures from 2D Annotations	Jun 15, 2023	Scene Understanding	CodeCode Available	1	5
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments	Dec 14, 2023	3D ReconstructionDecoder	CodeCode Available	1	5
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data	Nov 17, 2021	3D Object Detectionobject-detection	CodeCode Available	1	5
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Jan 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation	Dec 24, 2021	Depth EstimationDepth Prediction	CodeCode Available	1	5
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding	Aug 20, 2024	ObjectScene Understanding	CodeCode Available	1	5
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis	Jun 3, 2025	Novel View SynthesisScene Understanding	CodeCode Available	1	5
Point Scene Understanding via Disentangled Instance Mesh Reconstruction	Mar 31, 2022	RetrievalScene Understanding	CodeCode Available	1	5
OvarNet: Towards Open-vocabulary Object Attribute Recognition	Jan 23, 2023	AttributeKnowledge Distillation	CodeCode Available	1	5
ROOT: VLM based System for Indoor Scene Understanding and Beyond	Nov 24, 2024	Scene GenerationScene Understanding	CodeCode Available	1	5

Show:10 25 50

← PrevPage 19 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified