Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 1723 papers

Title	Date	Tasks	Status	Hype
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models	May 16, 2024	In-Context LearningQuestion Answering	CodeCode Available	7
Trajectory Prediction Meets Large Language Models: A Survey	Jun 3, 2025	Language ModelingLanguage Modelling	CodeCode Available	5
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator	Feb 26, 2025	Depth EstimationDiversity	CodeCode Available	4
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation	Dec 4, 2023	Depth EstimationGPU	CodeCode Available	4
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM	Dec 4, 2023	Camera Pose EstimationNovel View Synthesis	CodeCode Available	4
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving	Oct 29, 2024	Autonomous DrivingScene Understanding	CodeCode Available	4
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model	Mar 30, 2025	Autonomous DrivingDecision Making	CodeCode Available	4
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models	Feb 12, 2024	HallucinationObject Localization	CodeCode Available	4
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models	Jan 2, 2025	Scene Understandingtext annotation	CodeCode Available	4
Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation	Apr 5, 2024	DecoderMamba	CodeCode Available	3
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining	Mar 23, 2025	3DGSBenchmarking	CodeCode Available	3
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving	Aug 9, 2024	3D Object DetectionAutonomous Driving	CodeCode Available	3
SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM	Feb 5, 2024	3D Semantic SegmentationCamera Pose Estimation	CodeCode Available	3
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation	Apr 7, 2025	3D geometryRGBD Semantic Segmentation	CodeCode Available	3
Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving	May 8, 2024	Autonomous DrivingLIDAR Semantic Segmentation	CodeCode Available	3
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation	Jan 24, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available	3
iDisc: Internal Discretization for Monocular Depth Estimation	Apr 13, 2023	Autonomous DrivingDepth Estimation	CodeCode Available	3
4D Panoptic Scene Graph Generation	May 16, 2024	4D Panoptic SegmentationGraph Generation	CodeCode Available	3
EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video	Sep 3, 2024	3D ReconstructionScene Understanding	CodeCode Available	3
GARField: Group Anything with Radiance Fields	Jan 17, 2024	Scene Understanding	CodeCode Available	3
MoAI: Mixture of All Intelligence for Large Language and Vision Models	Mar 12, 2024	AllMixture-of-Experts	CodeCode Available	3
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models	Nov 11, 2023	Image CaptioningMMR total	CodeCode Available	3
AudioBench: A Universal Benchmark for Audio Large Language Models	Jun 23, 2024	Audio Scene UnderstandingInstruction Following	CodeCode Available	3
CrossOver: 3D Scene Cross-Modal Alignment	Feb 20, 2025	cross-modal alignmentObject	CodeCode Available	3
Embodied Understanding of Driving Scenarios	Mar 7, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available	3

Show:10 25 50

← PrevPage 1 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified