Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 1723 papers

Title	Date	Tasks	Status	Hype
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models	May 16, 2024	In-Context LearningQuestion Answering	CodeCode Available	7
Trajectory Prediction Meets Large Language Models: A Survey	Jun 3, 2025	Language ModelingLanguage Modelling	CodeCode Available	5
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM	Dec 4, 2023	Camera Pose EstimationNovel View Synthesis	CodeCode Available	4
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation	Dec 4, 2023	Depth EstimationGPU	CodeCode Available	4
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models	Feb 12, 2024	HallucinationObject Localization	CodeCode Available	4
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving	Oct 29, 2024	Autonomous DrivingScene Understanding	CodeCode Available	4
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models	Jan 2, 2025	Scene Understandingtext annotation	CodeCode Available	4
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model	Mar 30, 2025	Autonomous DrivingDecision Making	CodeCode Available	4
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator	Feb 26, 2025	Depth EstimationDiversity	CodeCode Available	4
EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video	Sep 3, 2024	3D ReconstructionScene Understanding	CodeCode Available	3

Show:10 25 50

← PrevPage 1 of 173Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified