Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–450 of 1723 papers

Title	Date	Tasks	Status	Hype
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding	Mar 24, 2021	ObjectRelation	CodeCode Available	1
Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation	Dec 22, 2021	Common Sense ReasoningQuestion Answering	CodeCode Available	1
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields	Apr 1, 2024	Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation	CodeCode Available	1
ReorientBot: Learning Object Reorientation for Specific-Posed Placement	Feb 22, 2022	Motion GenerationMotion Planning	CodeCode Available	1
RescueNet: A High Resolution UAV Semantic Segmentation Benchmark Dataset for Natural Disaster Damage Assessment	Feb 24, 2022	Scene UnderstandingSegmentation	CodeCode Available	1
RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction	Nov 30, 2020	3D geometryObject	CodeCode Available	1
Grounded Situation Recognition with Transformers	Nov 19, 2021	DecoderGrounded Situation Recognition	CodeCode Available	1
Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation	Jul 23, 2021	Domain AdaptationFew-Shot Learning	CodeCode Available	1
ROOT: VLM based System for Indoor Scene Understanding and Beyond	Nov 24, 2024	Scene GenerationScene Understanding	CodeCode Available	1
Distilled Semantics for Comprehensive Scene Understanding from Videos	Mar 31, 2020	Depth EstimationKnowledge Distillation	CodeCode Available	1
Generating Visual Spatial Description via Holistic 3D Scene Understanding	May 19, 2023	Scene UnderstandingText Generation	CodeCode Available	1
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models	Aug 27, 2024	DescriptiveLanguage Modeling	CodeCode Available	1
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection	Dec 25, 2023	3D Object Detectionobject-detection	CodeCode Available	1
SaccadeNet: A Fast and Accurate Object Detector	Mar 26, 2020	Objectobject-detection	CodeCode Available	1
General Geometry-aware Weakly Supervised 3D Object Detection	Jul 18, 2024	3D Object DetectionObject	CodeCode Available	1
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model	Mar 30, 2025	Depth EstimationMonocular Depth Estimation	CodeCode Available	1
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding	Nov 29, 2024	3D geometry3DGS	CodeCode Available	1
Explainable Object-induced Action Decision for Autonomous Vehicles	Mar 20, 2020	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
GFF: Gated Fully Fusion for Semantic Segmentation	Apr 3, 2019	Scene ParsingScene Understanding	CodeCode Available	1
SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences	Mar 27, 2021	3D Object Classification3d scene graph generation	CodeCode Available	1
SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Multiple Environments	Nov 9, 2020	Autonomous DrivingDepth Estimation	CodeCode Available	1
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation	Nov 2, 2015	Crowd CountingDecoder	CodeCode Available	1
F-ViTA: Foundation Model Guided Visible to Thermal Translation	Apr 3, 2025	Scene UnderstandingStyle Transfer	CodeCode Available	1
DPF: Learning Dense Prediction Fields with Weak Supervision	Mar 29, 2023	Intrinsic Image DecompositionPrediction	CodeCode Available	1
Boundary-induced and scene-aggregated network for monocular depth prediction	Feb 26, 2021	Depth EstimationDepth Prediction	CodeCode Available	1
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models	Jul 23, 2022	Scene Understanding	CodeCode Available	1
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection	Jul 30, 2021	3D Object Detectionobject-detection	CodeCode Available	1
Semantic Segmentation-Assisted Instance Feature Fusion for Multi-Level 3D Part Instance Segmentation	Aug 9, 2022	3D Instance Segmentation3D Part Segmentation	CodeCode Available	1
From General to Specific: Informative Scene Graph Generation via Balance Adjustment	Aug 30, 2021	BlockingGraph Generation	CodeCode Available	1
SemSegDepth: A Combined Model for Semantic Segmentation and Depth Completion	Sep 1, 2022	Depth CompletionScene Understanding	CodeCode Available	1
Global Aggregation then Local Distribution in Fully Convolutional Networks	Sep 16, 2019	Instance Segmentationobject-detection	CodeCode Available	1
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving	May 13, 2025	3D visual groundingAutonomous Driving	CodeCode Available	1
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding	Dec 5, 2020	image-classificationImage Classification	CodeCode Available	1
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction	May 9, 2024	Contrastive LearningScene Understanding	CodeCode Available	1
Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding	Apr 16, 2020	Human Part SegmentationPanoptic Segmentation	CodeCode Available	1
Dual-Hybrid Attention Network for Specular Highlight Removal	Jul 17, 2024	highlight removalObject Recognition	CodeCode Available	1
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous Driving	Aug 14, 2023	Autonomous DrivingOptical Flow Estimation	CodeCode Available	1
Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds	Sep 1, 2021	3D Object Detection3D Point Cloud Classification	CodeCode Available	1
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning	May 31, 2022	Common Sense ReasoningGraph Generation	CodeCode Available	1
Dynamic Graph Message Passing Networks	Aug 19, 2019	Image Classificationobject-detection	CodeCode Available	1
Dynamic Graph Message Passing Networks for Visual Recognition	Sep 20, 2022	image-classificationImage Classification	CodeCode Available	1
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models	May 15, 2023	3D Object DetectionImage Captioning	CodeCode Available	1
A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed Images	Feb 16, 2021	Decision MakingScene Understanding	CodeCode Available	1
Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild	Jul 23, 2020	Few-Shot Object DetectionMeta-Learning	CodeCode Available	1
Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation	Mar 8, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available	1
FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud Segmentation	Mar 1, 2021	3D Semantic SegmentationDecoder	CodeCode Available	1
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data	Nov 17, 2021	3D Object Detectionobject-detection	CodeCode Available	1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Jan 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation	Dec 24, 2021	Depth EstimationDepth Prediction	CodeCode Available	1
FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier Convolutions	Oct 4, 2022	Depth EstimationMonocular Depth Estimation	CodeCode Available	1

Show:10 25 50

← PrevPage 9 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified