Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 1723 papers

Title	Date	Tasks	Status	Hype
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition	Mar 20, 2023	RetrievalScene Understanding	CodeCode Available	2
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis	Jan 30, 2023	Image GenerationScene Understanding	CodeCode Available	2
Diffusion-based Generation, Optimization, and Planning in 3D Scenes	Jan 15, 2023	DenoisingGrasp Generation	CodeCode Available	2
Panoptic Lifting for 3D Scene Understanding with Neural Fields	Dec 19, 2022	2D Panoptic SegmentationPanoptic Segmentation	CodeCode Available	2
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding	Nov 29, 2022	3D Open-Vocabulary Instance SegmentationContrastive Learning	CodeCode Available	2
OpenScene: 3D Scene Understanding with Open Vocabularies	Nov 28, 2022	3D Open-Vocabulary Instance Segmentation3D Semantic Segmentation	CodeCode Available	2
Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer	Jul 28, 2022	Autonomous DrivingAutonomous Vehicles	CodeCode Available	2
Panoptic Scene Graph Generation	Jul 22, 2022	BenchmarkingPanoptic Scene Graph Generation	CodeCode Available	2
BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation	Apr 3, 2022	DecoderDepth Estimation	CodeCode Available	2
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding	Mar 15, 2022	Boundary DetectionHuman Parsing	CodeCode Available	2
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers	Mar 9, 2022	3D Object DetectionAutonomous Vehicles	CodeCode Available	2
GroupViT: Semantic Segmentation Emerges from Text Supervision	Feb 22, 2022	Object DetectionScene Understanding	CodeCode Available	2
HAKE: A Knowledge Engine Foundation for Human Activity Understanding	Feb 14, 2022	Action RecognitionHuman-Object Interaction Detection	CodeCode Available	2
Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking	Sep 8, 2021	BenchmarkingDiversity	CodeCode Available	2
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding	Nov 4, 2020	Multi-Task LearningScene Understanding	CodeCode Available	2
Multi-Task Learning as Multi-Objective Optimization	Oct 10, 2018	Depth EstimationGeneral Classification	CodeCode Available	2
Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation	Jul 15, 2025	Large Language ModelScene Understanding	CodeCode Available	1
SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting	Jun 29, 2025	3D ReconstructionScene Understanding	CodeCode Available	1
ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation	Jun 26, 2025	Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation	CodeCode Available	1
DIP: Unsupervised Dense In-Context Post-training of Visual Representations	Jun 23, 2025	GPUMeta-Learning	CodeCode Available	1
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving	Jun 6, 2025	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis	Jun 4, 2025	Action GenerationDecision Making	CodeCode Available	1
PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis	Jun 3, 2025	Novel View SynthesisScene Understanding	CodeCode Available	1
CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation	May 22, 2025	Scene UnderstandingSpatial Reasoning	CodeCode Available	1
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation	May 15, 2025	Face RecognitionObject	CodeCode Available	1

Show:10 25 50

← PrevPage 5 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified