Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 1723 papers

Title	Date	Tasks	Status	Hype
Unified Representation Space for 3D Visual Grounding	Jun 17, 2025	3D visual groundingContrastive Learning	—Unverified	0
FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding	Jun 16, 2025	FormGraph Generation	—Unverified	0
SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis	Jun 12, 2025	Novel View SynthesisScene Understanding	—Unverified	0
SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields	Jun 11, 2025	3D ReconstructionScene Understanding	—Unverified	0
PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly	Jun 10, 2025	Question AnsweringScene Understanding	—Unverified	0
Robust Visual Localization via Semantic-Guided Multi-Scale Transformer	Jun 10, 2025	regressionScene Understanding	—Unverified	0
SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting	Jun 10, 2025	3DGSScene Understanding	—Unverified	0
OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting	Jun 9, 2025	3DGS3D Instance Segmentation	—Unverified	0
Design and Evaluation of Deep Learning-Based Dual-Spectrum Image Fusion Methods	Jun 9, 2025	FairnessScene Understanding	—Unverified	0
SpatialLM: Training Large Language Models for Structured Indoor Modeling	Jun 9, 2025	3D Object DetectionLanguage Modeling	—Unverified	0
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving	Jun 6, 2025	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs	Jun 5, 2025	cross-modal alignmentDense Captioning	—Unverified	0
ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation	Jun 5, 2025	3D ReconstructionNeRF	—Unverified	0
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis	Jun 4, 2025	Action GenerationDecision Making	CodeCode Available	1
Tactile MNIST: Benchmarking Active Tactile Perception	Jun 3, 2025	BenchmarkingScene Understanding	—Unverified	0
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation	Jun 3, 2025	Caption GenerationImage Captioning	—Unverified	0
Trajectory Prediction Meets Large Language Models: A Survey	Jun 3, 2025	Language ModelingLanguage Modelling	CodeCode Available	5
PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis	Jun 3, 2025	Novel View SynthesisScene Understanding	CodeCode Available	1
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes	Jun 2, 2025	Scene Understanding	—Unverified	0
Learning Sparsity for Effective and Efficient Music Performance Question Answering	Jun 2, 2025	Audio-visual Question AnsweringQuestion Answering	—Unverified	0
Tackling View-Dependent Semantics in 3D Language Gaussian Splatting	May 30, 2025	3D Scene ReconstructionScene Understanding	CodeCode Available	2
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors	May 30, 2025	3D geometryLarge Language Model	CodeCode Available	0
SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model	May 29, 2025	Image Super-ResolutionLanguage Modeling	CodeCode Available	0
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation	May 28, 2025	Autonomous NavigationRAG	—Unverified	0
LiDAR Based Semantic Perception for Forklifts in Outdoor Environments	May 28, 2025	Scene UnderstandingSegmentation	—Unverified	0

Show:10 25 50

← PrevPage 2 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified