Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 111–120 of 1723 papers

Title	Date	Tasks	Status	Hype	Score
EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding	Dec 5, 2024	PredictionScene Understanding	CodeCode Available	2	5
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning	Dec 16, 2024	HallucinationRobot Manipulation	CodeCode Available	2	5
TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding	May 1, 2023	3D Object DetectionMonocular Depth Estimation	CodeCode Available	2	5
TextSLAM: Visual SLAM with Semantic Planar Text Features	May 17, 2023	Mixed RealityObject SLAM	CodeCode Available	2	5
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences	Dec 2, 2024	Embodied Question AnsweringQuestion Answering	CodeCode Available	2	5
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition	Mar 20, 2023	RetrievalScene Understanding	CodeCode Available	2	5
CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation	May 22, 2025	Scene UnderstandingSpatial Reasoning	CodeCode Available	1	5
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge	Nov 21, 2023	Large Language ModelMultimodal Deep Learning	CodeCode Available	1	5
3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding	Jan 6, 2024	Scene UnderstandingVisual Question Answering (VQA)	CodeCode Available	1	5

Show:10 25 50

← PrevPage 12 of 173Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified