Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 141–150 of 1723 papers

Title	Date	Tasks	Status	Hype
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models	Mar 17, 2025	Question AnsweringScene Understanding	CodeCode Available	1
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding	Mar 16, 2025	Autonomous DrivingRAG	CodeCode Available	1
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning	Mar 10, 2025	ObjectScene Understanding	CodeCode Available	1
VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion	Mar 8, 2025	3D Semantic Scene CompletionAutonomous Driving	CodeCode Available	1
Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy	Feb 15, 2025	Point Cloud RegistrationScene Understanding	CodeCode Available	1
Event-aided Semantic Scene Completion	Feb 4, 2025	Autonomous DrivingScene Understanding	CodeCode Available	1
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
A Survey of World Models for Autonomous Driving	Jan 20, 2025	Anomaly DetectionAutonomous Driving	CodeCode Available	1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Jan 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
All-Day Multi-Camera Multi-Target Tracking	Jan 1, 2025	AllMamba	CodeCode Available	1

Show:10 25 50

← PrevPage 15 of 173Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified