Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1201–1225 of 1723 papers

Title	Date	Tasks	Status
Reconstructing Vechicles from a Single Image: Shape Priors for Road Scene Understanding	Sep 29, 2016	Autonomous Drivingroad scene understanding	—Unverified
Recyclable Semi-supervised Method Based on Multi-model Ensemble for Video Scene Parsing	Jun 5, 2023	Scene ParsingScene Understanding	—Unverified
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection	Dec 11, 2023	BenchmarkingDomain Adaptation	—Unverified
Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications	Nov 18, 2024	Scene SegmentationScene Understanding	—Unverified
Referring Self-supervised Learning on 3D Point Cloud	Sep 29, 2021	Scene UnderstandingSelf-Supervised Learning	—Unverified
RefineCap: Concept-Aware Refinement for Image Captioning	Sep 8, 2021	DecoderDescriptive	—Unverified
CASPNet++: Joint Multi-Agent Motion Prediction	Aug 15, 2023	Autonomous Drivingmotion prediction	—Unverified
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios	Jun 25, 2025	Autonomous DrivingDecision Making	—Unverified
Cascaded Classification Models: Combining Models for Holistic Scene Understanding	Dec 1, 2008	3D Reconstruction3D Scene Reconstruction	—Unverified
Relationship Proposal Networks	Jul 1, 2017	AllScene Understanding	—Unverified
Relevance-driven Decision Making for Safer and More Efficient Human Robot Collaboration	Sep 21, 2024	Collision AvoidanceDecision Making	—Unverified
Relevance for Human Robot Collaboration	Sep 12, 2024	Dimensionality ReductionScene Understanding	—Unverified
Car Segmentation and Pose Estimation using 3D Object Models	Dec 21, 2015	3D Pose EstimationImage Segmentation	—Unverified
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving	Sep 11, 2023	Autonomous DrivingDescriptive	—Unverified
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps	May 24, 2025	Scene UnderstandingSpatial Reasoning	—Unverified
REMIPS: Physically Consistent 3D Reconstruction of Multiple Interacting People under Weak Supervision	Dec 1, 2021	3D Human Reconstruction3D Reconstruction	—Unverified
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving	Sep 4, 2024	Autonomous DrivingDecision Making	—Unverified
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind	May 18, 2025	BenchmarkingScene Understanding	—Unverified
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?	Apr 23, 2022	Robot ManipulationScene Understanding	—Unverified
Residual 3D Scene Flow Learning with Context-Aware Feature Extraction	Sep 10, 2021	Autonomous DrivingScene Flow Estimation	—Unverified
Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders	Oct 7, 2024	Multiview DetectionScene Understanding	—Unverified
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding	May 18, 2023	Contrastive LearningObject	—Unverified
3D Shape Augmentation with Content-Aware Shape Resizing	May 15, 2024	3D GenerationScene Understanding	—Unverified
BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-task Dense Predictions	Dec 21, 2023	DecoderMulti-Task Learning	—Unverified
Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets	Jul 29, 2024	DecoderScene Understanding	—Unverified

Show:10 25 50

← PrevPage 49 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified