Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 126–150 of 1723 papers

Title	Date	Tasks	Status	Hype
Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation	Apr 2, 2025	3D Semantic SegmentationAdversarial Attack	—Unverified	0
CoMatcher: Multi-View Collaborative Feature Matching	Apr 2, 2025	Scene Understandingset matching	—Unverified	0
TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication	Apr 2, 2025	Language ModelingLanguage Modelling	—Unverified	0
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness	Apr 2, 2025	Scene Understanding	—Unverified	0
Scene-Centric Unsupervised Panoptic Segmentation	Apr 2, 2025	Instance SegmentationPanoptic Segmentation	CodeCode Available	2
WikiVideo: Article Generation from Multiple Videos	Apr 1, 2025	ArticlesRAG	CodeCode Available	1
Zero-Shot 4D Lidar Panoptic Segmentation	Apr 1, 2025	DiversityPanoptic Segmentation	—Unverified	0
Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights	Apr 1, 2025	Activity PredictionDomain Generalization	—Unverified	0
PhysPose: Refining 6D Object Poses with Physical Constraints	Mar 30, 2025	6D Pose Estimation using RGBPose Estimation	—Unverified	0
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model	Mar 30, 2025	Depth EstimationMonocular Depth Estimation	CodeCode Available	1
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model	Mar 30, 2025	Autonomous DrivingDecision Making	CodeCode Available	4
Empowering Large Language Models with 3D Situation Awareness	Mar 29, 2025	Scene Understanding	—Unverified	0
Evaluating Compositional Scene Understanding in Multimodal Generative Models	Mar 29, 2025	Scene Understanding	CodeCode Available	0
Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery	Mar 29, 2025	Action UnderstandingInstrument Recognition	—Unverified	0
Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments	Mar 29, 2025	NavigateOpen Vocabulary Semantic Segmentation	—Unverified	0
Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction	Mar 28, 2025	Autonomous DrivingScene Understanding	CodeCode Available	1
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users	Mar 28, 2025	Object RecognitionReading Comprehension	—Unverified	0
A Dataset for Semantic Segmentation in the Presence of Unknowns	Mar 28, 2025	Anomaly DetectionAnomaly Segmentation	—Unverified	0
Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision	Mar 28, 2025	Optical Flow EstimationPoint Tracking	—Unverified	0
Next-Best-Trajectory Planning of Robot Manipulators for Effective Observation and Exploration	Mar 28, 2025	Computational EfficiencyObject Reconstruction	—Unverified	0
NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving	Mar 28, 2025	3D visual groundingAutonomous Driving	—Unverified	0
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting	Mar 27, 2025	counterfactualObject	—Unverified	0
Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving	Mar 27, 2025	3D Semantic SegmentationAutonomous Driving	CodeCode Available	2
DINeMo: Learning Neural Mesh Models with no 3D Annotations	Mar 26, 2025	3D Pose Estimation6D Pose Estimation	—Unverified	0
COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting	Mar 25, 2025	3DGSObject	CodeCode Available	2

Show:10 25 50

← PrevPage 6 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified