Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–525 of 1723 papers

Title	Date	Tasks	Status
SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis	Jun 12, 2025	Novel View SynthesisScene Understanding	—Unverified
SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields	Jun 11, 2025	3D ReconstructionScene Understanding	—Unverified
SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting	Jun 10, 2025	3DGSScene Understanding	—Unverified
Robust Visual Localization via Semantic-Guided Multi-Scale Transformer	Jun 10, 2025	regressionScene Understanding	—Unverified
PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly	Jun 10, 2025	Question AnsweringScene Understanding	—Unverified
SpatialLM: Training Large Language Models for Structured Indoor Modeling	Jun 9, 2025	3D Object DetectionLanguage Modeling	—Unverified
Design and Evaluation of Deep Learning-Based Dual-Spectrum Image Fusion Methods	Jun 9, 2025	FairnessScene Understanding	—Unverified
OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting	Jun 9, 2025	3DGS3D Instance Segmentation	—Unverified
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs	Jun 5, 2025	cross-modal alignmentDense Captioning	—Unverified
ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation	Jun 5, 2025	3D ReconstructionNeRF	—Unverified
Tactile MNIST: Benchmarking Active Tactile Perception	Jun 3, 2025	BenchmarkingScene Understanding	—Unverified
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation	Jun 3, 2025	Caption GenerationImage Captioning	—Unverified
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes	Jun 2, 2025	Scene Understanding	—Unverified
Learning Sparsity for Effective and Efficient Music Performance Question Answering	Jun 2, 2025	Audio-visual Question AnsweringQuestion Answering	—Unverified
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors	May 30, 2025	3D geometryLarge Language Model	CodeCode Available
SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model	May 29, 2025	Image Super-ResolutionLanguage Modeling	CodeCode Available
LiDAR Based Semantic Perception for Forklifts in Outdoor Environments	May 28, 2025	Scene UnderstandingSegmentation	—Unverified
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation	May 28, 2025	Autonomous NavigationRAG	—Unverified
OmniIndoor3D: Comprehensive Indoor 3D Reconstruction	May 27, 2025	3DGS3D Reconstruction	—Unverified
OccLE: Label-Efficient 3D Semantic Occupancy Prediction	May 27, 2025	3D Semantic Occupancy PredictionAutonomous Driving	—Unverified
A Graph Completion Method that Jointly Predicts Geometry and Topology Enables Effective Molecule Assembly	May 27, 2025	DenoisingDrug Design	—Unverified
Compositional Scene Understanding through Inverse Generative Modeling	May 27, 2025	Scene Understanding	—Unverified
Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks	May 27, 2025	3D Scene ReconstructionDiagnostic	—Unverified
Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement	May 26, 2025	Image Enhancementobject-detection	—Unverified
FHGS: Feature-Homogenized Gaussian Splatting	May 25, 2025	3DGSScene Understanding	—Unverified

Show:10 25 50

← PrevPage 21 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified