Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 1723 papers

Title	Date	Tasks	Status	Hype
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation	Jun 3, 2025	Caption GenerationImage Captioning	—Unverified	0
Trajectory Prediction Meets Large Language Models: A Survey	Jun 3, 2025	Language ModelingLanguage Modelling	CodeCode Available	5
PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis	Jun 3, 2025	Novel View SynthesisScene Understanding	CodeCode Available	1
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes	Jun 2, 2025	Scene Understanding	—Unverified	0
Learning Sparsity for Effective and Efficient Music Performance Question Answering	Jun 2, 2025	Audio-visual Question AnsweringQuestion Answering	—Unverified	0
Tackling View-Dependent Semantics in 3D Language Gaussian Splatting	May 30, 2025	3D Scene ReconstructionScene Understanding	CodeCode Available	2
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors	May 30, 2025	3D geometryLarge Language Model	CodeCode Available	0
SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model	May 29, 2025	Image Super-ResolutionLanguage Modeling	CodeCode Available	0
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation	May 28, 2025	Autonomous NavigationRAG	—Unverified	0
LiDAR Based Semantic Perception for Forklifts in Outdoor Environments	May 28, 2025	Scene UnderstandingSegmentation	—Unverified	0

Show:10 25 50

← PrevPage 5 of 173Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified