Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–575 of 1723 papers

Title	Date	Tasks	Status
RAFT: Robust Augmentation of FeaTures for Image Segmentation	May 7, 2025	Active LearningDomain Adaptation	—Unverified
Segment Any RGB-Thermal Model with Language-aided Distillation	May 4, 2025	Instance SegmentationKnowledge Distillation	—Unverified
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation	May 4, 2025	BenchmarkingFeature Upsampling	CodeCode Available
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models	May 3, 2025	DiagnosticObject Recognition	—Unverified
Embracing Diffraction: A Paradigm Shift in Wireless Sensing and Communication	May 2, 2025	Scene Understanding	—Unverified
V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving	Apr 30, 2025	Autonomous DrivingDecision Making	—Unverified
Category-Level and Open-Set Object Pose Estimation for Robotics	Apr 28, 2025	6D Pose Estimation6D Pose Estimation using RGB	—Unverified
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding	Apr 28, 2025	3D Semantic SegmentationContrastive Learning	—Unverified
TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance	Apr 23, 2025	Question AnsweringScene Understanding	—Unverified
Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends	Apr 21, 2025	Adversarial RobustnessDecision Making	—Unverified
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Apr 20, 2025	Autonomous DrivingImage Captioning	CodeCode Available
Vision-Centric Representation-Efficient Fine-Tuning for Robust Universal Foreground Segmentation	Apr 20, 2025	AttributeForeground Segmentation	—Unverified
Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding	Apr 18, 2025	Deep LearningPoint Cloud Completion	CodeCode Available
Temporal Propagation of Asymmetric Feature Pyramid for Surgical Scene Segmentation	Apr 18, 2025	Scene SegmentationScene Understanding	—Unverified
HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering	Apr 18, 2025	ClusteringGraph Clustering	—Unverified
Explainable Scene Understanding with Qualitative Representations and Graph Neural Networks	Apr 17, 2025	Autonomous DrivingScene Understanding	—Unverified
CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting	Apr 16, 2025	3DGS3D Instance Segmentation	—Unverified
Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning	Apr 15, 2025	Multi-Task LearningScene Understanding	—Unverified
Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization	Apr 14, 2025	BenchmarkingEarth Observation	—Unverified
DSM: Building A Diverse Semantic Map for 3D Visual Grounding	Apr 11, 2025	3D visual groundingScene Understanding	—Unverified
FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents	Apr 11, 2025	3DGSNavigate	—Unverified
FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment	Apr 11, 2025	3D geometryNatural Language Queries	—Unverified
DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction	Apr 10, 2025	GPUPrediction	—Unverified
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Apr 9, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration	Apr 9, 2025	3D Semantic SegmentationBenchmarking	—Unverified

Show:10 25 50

← PrevPage 23 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified