Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 1723 papers

Title	Date	Tasks	Status	Hype
Temporal Propagation of Asymmetric Feature Pyramid for Surgical Scene Segmentation	Apr 18, 2025	Scene SegmentationScene Understanding	—Unverified	0
HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering	Apr 18, 2025	ClusteringGraph Clustering	—Unverified	0
Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs	Apr 17, 2025	3D geometry3DGS	CodeCode Available	1
Explainable Scene Understanding with Qualitative Representations and Graph Neural Networks	Apr 17, 2025	Autonomous DrivingScene Understanding	—Unverified	0
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency	Apr 16, 2025	Few-Shot LearningInteractive Segmentation	CodeCode Available	1
CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting	Apr 16, 2025	3DGS3D Instance Segmentation	—Unverified	0
Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning	Apr 15, 2025	Multi-Task LearningScene Understanding	—Unverified	0
Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization	Apr 14, 2025	BenchmarkingEarth Observation	—Unverified	0
SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding	Apr 14, 2025	Camera CalibrationObject Localization	CodeCode Available	1
FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents	Apr 11, 2025	3DGSNavigate	—Unverified	0
FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment	Apr 11, 2025	3D geometryNatural Language Queries	—Unverified	0
DSM: Building A Diverse Semantic Map for 3D Visual Grounding	Apr 11, 2025	3D visual groundingScene Understanding	—Unverified	0
DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction	Apr 10, 2025	GPUPrediction	—Unverified	0
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding	Apr 9, 2025	Scene UnderstandingSelf-Supervised Learning	CodeCode Available	1
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Apr 9, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available	0
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration	Apr 9, 2025	3D Semantic SegmentationBenchmarking	—Unverified	0
Audio-visual Event Localization on Portrait Mode Short Videos	Apr 9, 2025	audio-visual event localizationScene Understanding	—Unverified	0
Attributes-aware Visual Emotion Representation Learning	Apr 9, 2025	AttributeEmotion Recognition	—Unverified	0
PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario	Apr 8, 2025	3D Object DetectionAutonomous Driving	—Unverified	0
CamContextI2V: Context-aware Controllable Video Generation	Apr 8, 2025	DiversityScene Understanding	CodeCode Available	1
RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model	Apr 7, 2025	Image Captioningimage-classification	—Unverified	0
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation	Apr 7, 2025	3D geometryRGBD Semantic Segmentation	CodeCode Available	3
Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models	Apr 6, 2025	Computational EfficiencyGeneral Knowledge	CodeCode Available	0
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision	Apr 3, 2025	3D Object Detectioncross-modal alignment	CodeCode Available	1
F-ViTA: Foundation Model Guided Visible to Thermal Translation	Apr 3, 2025	Scene UnderstandingStyle Transfer	CodeCode Available	1

Show:10 25 50

← PrevPage 5 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified