Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–450 of 1723 papers

Title	Date	Tasks	Status	Hype	Score
A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed Images	Feb 16, 2021	Decision MakingScene Understanding	CodeCode Available	1	5
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection	Jul 13, 2019	3D Object Detection3D Object Detection From Monocular Images	CodeCode Available	1	5
MassMIND: Massachusetts Maritime INfrared Dataset	Sep 9, 2022	Instance SegmentationScene Understanding	CodeCode Available	1	5
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding	Apr 9, 2025	Scene UnderstandingSelf-Supervised Learning	CodeCode Available	1	5
Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for Mobile Agents via Unsupervised Contrastive Learning	Jun 21, 2022	Contrastive LearningDomain Generalization	CodeCode Available	1	5
PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation	Dec 19, 2024	LIDAR Semantic SegmentationScene Understanding	CodeCode Available	1	5
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection	Mar 14, 2023	3D Object DetectionDecoder	CodeCode Available	1	5
Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation	Jul 23, 2021	Domain AdaptationFew-Shot Learning	CodeCode Available	1	5
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving	Jun 6, 2025	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1	5
Distilled Semantics for Comprehensive Scene Understanding from Videos	Mar 31, 2020	Depth EstimationKnowledge Distillation	CodeCode Available	1	5
Event-aided Semantic Scene Completion	Feb 4, 2025	Autonomous DrivingScene Understanding	CodeCode Available	1	5
Microsoft COCO: Common Objects in Context	May 1, 2014	Instance SegmentationObject	CodeCode Available	1	5
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation	Sep 20, 2021	DecoderPrediction	CodeCode Available	1	5
SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting	Jun 29, 2025	3D ReconstructionScene Understanding	CodeCode Available	1	5
Estimating Generic 3D Room Structures from 2D Annotations	Jun 15, 2023	Scene Understanding	CodeCode Available	1	5
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model	Mar 30, 2025	Depth EstimationMonocular Depth Estimation	CodeCode Available	1	5
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding	Nov 29, 2024	3D geometry3DGS	CodeCode Available	1	5
Event-based Motion Segmentation with Spatio-Temporal Graph Cuts	Dec 16, 2020	Motion SegmentationScene Understanding	CodeCode Available	1	5
PanopticNDT: Efficient and Robust Panoptic Mapping	Sep 24, 2023	2D Panoptic Segmentation3D Panoptic Segmentation	CodeCode Available	1	5
A Versatile and Efficient Reinforcement Learning Framework for Autonomous Driving	Oct 22, 2021	Autonomous Drivingreinforcement-learning	CodeCode Available	1	5
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
0-MMS: Zero-Shot Multi-Motion Segmentation With A Monocular Event Camera	Jun 11, 2020	Motion CompensationMotion Segmentation	CodeCode Available	1	5
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning	Mar 10, 2025	ObjectScene Understanding	CodeCode Available	1	5
DPF: Learning Dense Prediction Fields with Weak Supervision	Mar 29, 2023	Intrinsic Image DecompositionPrediction	CodeCode Available	1	5
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge	Nov 21, 2023	Large Language ModelMultimodal Deep Learning	CodeCode Available	1	5
MonoDistill: Learning Spatial Features for Monocular 3D Object Detection	Jan 26, 2022	3D Object DetectionMonocular 3D Object Detection	CodeCode Available	1	5
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation	Jun 16, 2023	3D Panoptic SegmentationAutonomous Driving	CodeCode Available	1	5
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation	Dec 27, 2021	Computational EfficiencyInstance Segmentation	CodeCode Available	1	5
Explainable Object-induced Action Decision for Autonomous Vehicles	Mar 20, 2020	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1	5
TextSLAM: Visual SLAM with Planar Text Features	Nov 26, 2019	Object SLAMScene Understanding	CodeCode Available	1	5
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge	May 31, 2019	object-detectionObject Detection	CodeCode Available	1	5
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering	Mar 17, 2022	Implicit RelationsQuestion Answering	CodeCode Available	1	5
Multi3DRefer: Grounding Text Description to Multiple 3D Objects	Sep 11, 2023	3D visual groundingContrastive Learning	CodeCode Available	1	5
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction	May 9, 2024	Contrastive LearningScene Understanding	CodeCode Available	1	5
Panoptic 3D Scene Reconstruction From a Single RGB Image	Nov 3, 2021	2D Panoptic Segmentation3D Instance Segmentation	CodeCode Available	1	5
Dual-Hybrid Attention Network for Specular Highlight Removal	Jul 17, 2024	highlight removalObject Recognition	CodeCode Available	1	5
Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms	Sep 27, 2023	object-detectionObject Detection	CodeCode Available	1	5
Egocentric Scene Understanding via Multimodal Spatial Rectifier	Jul 14, 2022	Scene UnderstandingSurface Normal Estimation	CodeCode Available	1	5
Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding	Apr 16, 2020	Human Part SegmentationPanoptic Segmentation	CodeCode Available	1	5
Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments	Jul 10, 2022	Instance SegmentationPanoptic Segmentation	CodeCode Available	1	5
Dynamic Graph Message Passing Networks for Visual Recognition	Sep 20, 2022	image-classificationImage Classification	CodeCode Available	1	5
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models	May 15, 2023	3D Object DetectionImage Captioning	CodeCode Available	1	5
Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis	Mar 9, 2021	3d scene graph generationgraph construction	CodeCode Available	1	5
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering	Jul 30, 2024	Inverse RenderingNeRF	CodeCode Available	1	5
Multi-Scale Attention for Audio Question Answering	May 29, 2023	Audio Question AnsweringQuestion Answering	CodeCode Available	1	5
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition	Aug 23, 2023	Gesture RecognitionScene Understanding	CodeCode Available	1	5
P2T: Pyramid Pooling Transformer for Scene Understanding	Jun 22, 2021	image-classificationImage Classification	CodeCode Available	1	5
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data	Nov 17, 2021	3D Object Detectionobject-detection	CodeCode Available	1	5
ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation	Apr 16, 2024	3D Semantic SegmentationManagement	CodeCode Available	1	5
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Jan 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5

Show:10 25 50

← PrevPage 9 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified