Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–110 of 1723 papers

Title	Date	Tasks	Status	Hype
FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything	Feb 29, 2024	3D Object ReconstructionInstance Segmentation	CodeCode Available	2
RelationField: Relate Anything in Radiance Fields	Dec 18, 2024	3d scene graph generationGraph Generation	CodeCode Available	2
GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving	Nov 19, 2024	3D Object DetectionAutonomous Driving	CodeCode Available	2
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning	Dec 16, 2024	HallucinationRobot Manipulation	CodeCode Available	2
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future	Jul 18, 2023	Knowledge Distillationobject-detection	CodeCode Available	2
Scene-Centric Unsupervised Panoptic Segmentation	Apr 2, 2025	Instance SegmentationPanoptic Segmentation	CodeCode Available	2
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI	Dec 26, 2023	Scene Understanding	CodeCode Available	2
EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding	Dec 5, 2024	PredictionScene Understanding	CodeCode Available	2
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion	Jul 8, 2025	3D geometryDomain Generalization	CodeCode Available	2
IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes	Mar 20, 2025	Scene UnderstandingSpatial Reasoning	CodeCode Available	2

Show:10 25 50

← PrevPage 11 of 173Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified