Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–500 of 1723 papers

Title	Date	Tasks	Status	Hype	Score
One-Shot Object Affordance Detection in the Wild	Aug 8, 2021	Action RecognitionAffordance Detection	CodeCode Available	1	5
Real-Time Semantic Segmentation using Hyperspectral Images for Mapping Unstructured and Unknown Environments	Mar 27, 2023	Autonomous NavigationReal-Time Semantic Segmentation	CodeCode Available	1	5
You Only Need One Thing One Click: Self-Training for Weakly Supervised 3D Scene Understanding	Mar 26, 2023	3D Instance SegmentationInstance Segmentation	CodeCode Available	1	5
Online 3D reconstruction and dense tracking in endoscopic videos	Sep 9, 2024	3D Reconstruction3D Scene Reconstruction	CodeCode Available	1	5
ReorientBot: Learning Object Reorientation for Specific-Posed Placement	Feb 22, 2022	Motion GenerationMotion Planning	CodeCode Available	1	5
CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks	Mar 28, 2020	3D Medical Imaging SegmentationAction Recognition	CodeCode Available	1	5
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning	May 31, 2022	Common Sense ReasoningGraph Generation	CodeCode Available	1	5
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance	Dec 17, 2023	3D Instance Segmentation3D Open-Vocabulary Instance Segmentation	CodeCode Available	1	5
Egocentric Scene Understanding via Multimodal Spatial Rectifier	Jul 14, 2022	Scene UnderstandingSurface Normal Estimation	CodeCode Available	1	5
Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding	Apr 16, 2020	Human Part SegmentationPanoptic Segmentation	CodeCode Available	1	5
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous Driving	Aug 14, 2023	Autonomous DrivingOptical Flow Estimation	CodeCode Available	1	5
Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis	Mar 9, 2021	3d scene graph generationgraph construction	CodeCode Available	1	5
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts	Dec 16, 2020	3D Semantic SegmentationInstance Segmentation	CodeCode Available	1	5
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving	May 13, 2025	3D visual groundingAutonomous Driving	CodeCode Available	1	5
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments	Jul 15, 2023	DecoderGrounded Situation Recognition	CodeCode Available	1	5
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data	Nov 17, 2021	3D Object Detectionobject-detection	CodeCode Available	1	5
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Jan 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
OvarNet: Towards Open-vocabulary Object Attribute Recognition	Jan 23, 2023	AttributeKnowledge Distillation	CodeCode Available	1	5
Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene	Aug 11, 2020	Instance SegmentationPoint Cloud Segmentation	CodeCode Available	1	5
Who2com: Collaborative Perception via Learnable Handshake Communication	Mar 21, 2020	Multi-agent Reinforcement LearningReinforcement Learning	CodeCode Available	1	5
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation	Dec 24, 2021	Depth EstimationDepth Prediction	CodeCode Available	1	5
Explainable Object-induced Action Decision for Autonomous Vehicles	Mar 20, 2020	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1	5
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction	Apr 16, 2024	3D Reconstruction3D Shape Reconstruction	CodeCode Available	1	5
Panoptic 3D Scene Reconstruction From a Single RGB Image	Nov 3, 2021	2D Panoptic Segmentation3D Instance Segmentation	CodeCode Available	1	5
RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving	Jan 14, 2024	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Semantic Segmentation-Assisted Instance Feature Fusion for Multi-Level 3D Part Instance Segmentation	Aug 9, 2022	3D Instance Segmentation3D Part Segmentation	CodeCode Available	1	5
Panoptic Video Scene Graph Generation	Nov 28, 2023	Graph GenerationPanoptic Scene Graph Generation	CodeCode Available	1	5
Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for Mobile Agents via Unsupervised Contrastive Learning	Jun 21, 2022	Contrastive LearningDomain Generalization	CodeCode Available	1	5
Towards Efficient Scene Understanding via Squeeze Reasoning	Nov 6, 2020	Instance Segmentationobject-detection	CodeCode Available	1	5
Predicting Deeper into the Future of Semantic Segmentation	Mar 22, 2017	AttributeAutonomous Driving	CodeCode Available	0	5
Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment	Jun 12, 2024	3D ReconstructionScene Understanding	CodeCode Available	0	5
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Apr 20, 2025	Autonomous DrivingImage Captioning	CodeCode Available	0	5
Pose-aware Multi-level Feature Network for Human Object Interaction Detection	Sep 18, 2019	Human-Object Interaction DetectionObject	CodeCode Available	0	5
Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models	Apr 6, 2025	Computational EfficiencyGeneral Knowledge	CodeCode Available	0	5
Evaluating Compositional Scene Understanding in Multimodal Generative Models	Mar 29, 2025	Scene Understanding	CodeCode Available	0	5
A Review on Deep Learning Techniques Applied to Semantic Segmentation	Apr 22, 2017	Autonomous DrivingDeep Learning	CodeCode Available	0	5
ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation	Oct 9, 2017	GPUReal-Time Semantic Segmentation	CodeCode Available	0	5
Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video	May 27, 2019	Inductive BiasModel Predictive Control	CodeCode Available	0	5
PENet: A Joint Panoptic Edge Detection Network	Mar 15, 2023	Edge DetectionMulti-Task Learning	CodeCode Available	0	5
CARL-D: A vision benchmark suite and large scale dataset for vehicle detection and scene segmentation	Feb 17, 2022	2D Object DetectionAutonomous Driving	CodeCode Available	0	5
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding	Oct 19, 2024	Autonomous Drivingobject-detection	CodeCode Available	0	5
Parsing Geometry Using Structure-Aware Shape Templates	Aug 3, 2018	ObjectObject Recognition	CodeCode Available	0	5
Parsing Natural Scenes and Natural Language with Recursive Neural Networks	Jun 1, 2011	General ClassificationScene Classification	CodeCode Available	0	5
Panoramic Depth Estimation via Supervised and Unsupervised Learning in Indoor Scenes	Aug 18, 2021	Camera CalibrationDepth Estimation	CodeCode Available	0	5
PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video	Jan 1, 2024	3D Panoptic Segmentation3D Reconstruction	CodeCode Available	0	5
OVeNet: Offset Vector Network for Semantic Segmentation	Mar 25, 2023	Optical Character Recognition (OCR)Scene Understanding	CodeCode Available	0	5
OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies	Dec 31, 2024	3DGS3D Semantic Segmentation	CodeCode Available	0	5
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding	Jul 10, 2025	Scene UnderstandingSpatial Reasoning	CodeCode Available	0	5
P2AT: Pyramid Pooling Axial Transformer for Real-time Semantic Segmentation	Oct 23, 2023	Autonomous DrivingDecoder	CodeCode Available	0	5
Parallel Neural Computing for Scene Understanding from LiDAR Perception in Autonomous Racing	Dec 24, 2024	Autonomous DrivingAutonomous Racing	CodeCode Available	0	5

Show:10 25 50

← PrevPage 10 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified