Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 1723 papers

Title	Date	Tasks	Status	Hype
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts	Dec 16, 2020	3D Semantic SegmentationInstance Segmentation	CodeCode Available	1
Human-centric Scene Understanding for 3D Large-scale Scenarios	Jul 26, 2023	Action RecognitionScene Understanding	CodeCode Available	1
IDA-3D: Instance-Depth-Aware 3D Object Detection From Stereo Vision for Autonomous Driving	Jun 1, 2020	3D Object DetectionAutonomous Driving	CodeCode Available	1
CamContextI2V: Context-aware Controllable Video Generation	Apr 8, 2025	DiversityScene Understanding	CodeCode Available	1
BoMuDANet: Unsupervised Adaptation for Visual Scene Understanding in Unstructured Driving Environments	Sep 22, 2020	Domain AdaptationScene Understanding	CodeCode Available	1
Image Segmentation Using Deep Learning: A Survey	Jan 15, 2020	DecoderDeep Learning	CodeCode Available	1
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning	May 31, 2022	Common Sense ReasoningGraph Generation	CodeCode Available	1
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding	Nov 29, 2024	3D geometry3DGS	CodeCode Available	1
AVSegFormer: Audio-Visual Segmentation with Transformer	Jul 3, 2023	DecoderScene Understanding	CodeCode Available	1
Instance-wise Occlusion and Depth Orders in Natural Scenes	Nov 29, 2021	Depth EstimationDepth Prediction	CodeCode Available	1
Explainable Object-induced Action Decision for Autonomous Vehicles	Mar 20, 2020	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
Boundary-induced and scene-aggregated network for monocular depth prediction	Feb 26, 2021	Depth EstimationDepth Prediction	CodeCode Available	1
IRS: A Large Naturalistic Indoor Robotics Stereo Dataset to Train Deep Models for Disparity and Surface Normal Estimation	Dec 20, 2019	Disparity EstimationScene Understanding	CodeCode Available	1
Joint 2D-3D-Semantic Data for Indoor Scene Understanding	Feb 3, 2017	Scene Understanding	CodeCode Available	1
Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis	Mar 9, 2021	3d scene graph generationgraph construction	CodeCode Available	1
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving	May 13, 2025	3D visual groundingAutonomous Driving	CodeCode Available	1
Estimating Generic 3D Room Structures from 2D Annotations	Jun 15, 2023	Scene Understanding	CodeCode Available	1
Language-Assisted 3D Feature Learning for Semantic Scene Understanding	Nov 25, 2022	DescriptiveInstance Segmentation	CodeCode Available	1
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing	Nov 24, 2021	AttributeScene Understanding	CodeCode Available	1
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models	May 15, 2023	3D Object DetectionImage Captioning	CodeCode Available	1
Learning and Reasoning with the Graph Structure Representation in Robotic Surgery	Jul 7, 2020	Edge ClassificationGraph Generation	CodeCode Available	1
Learning How To Robustly Estimate Camera Pose in Endoscopic Videos	Apr 17, 2023	3D ReconstructionCamera Pose Estimation	CodeCode Available	1
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation	Sep 20, 2021	DecoderPrediction	CodeCode Available	1
Learning to Answer Questions in Dynamic Audio-Visual Scenarios	Mar 26, 2022	audio-visual learningAudio-visual Question Answering	CodeCode Available	1
Event-aided Semantic Scene Completion	Feb 4, 2025	Autonomous DrivingScene Understanding	CodeCode Available	1

Show:10 25 50

← PrevPage 9 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified