Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1001–1050 of 1723 papers

Title	Date	Tasks	Status
From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction	Mar 20, 2025	3D ReconstructionAnatomy	—Unverified
From Real to Synthetic and Back: Synthesizing Training Data for Multi-Person Scene Understanding	Jun 3, 2020	Depth EstimationGenerative Adversarial Network	—Unverified
Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery	Jun 8, 2016	Scene UnderstandingVocal Bursts Intensity Prediction	—Unverified
Fusion Based Holistic Road Scene Understanding	Jun 29, 2014	ClusteringImage Segmentation	—Unverified
FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation	Aug 26, 2024	Autonomous DrivingImage Segmentation	—Unverified
Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences	Sep 6, 2024	3D Object DetectionAutonomous Driving	—Unverified
Gaga: Group Any Gaussians via 3D-aware Memory Bank	Apr 11, 2024	Contrastive LearningObject Tracking	—Unverified
GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting	Dec 18, 2024	Scene UnderstandingSemantic Segmentation	—Unverified
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning	Dec 1, 2015	FrictionScene Understanding	—Unverified
GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games	May 22, 2024	Code GenerationDecision Making	—Unverified
GANspection	Oct 21, 2019	Scene Understanding	—Unverified
GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation	Jul 19, 2024	BEV SegmentationScene Understanding	—Unverified
GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting	Sep 3, 2024	3DGSGPU	—Unverified
Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data	Dec 7, 2022	Scene UnderstandingSegmentation	—Unverified
General-Purpose Aerial Intelligent Agents Empowered by Large Language Models	Mar 11, 2025	Motion PlanningScene Understanding	—Unverified
Generating Robot Constitutions & Benchmarks for Semantic Safety	Mar 11, 2025	Collision AvoidanceImage Generation	—Unverified
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis	May 23, 2024	Novel View SynthesisScene Understanding	—Unverified
Generative Video Transformer: Can Objects be the Words?	Jul 20, 2021	GPUScene Understanding	—Unverified
Geometric Constrained Non-Line-of-Sight Imaging	Mar 23, 2025	Scene UnderstandingSurface Reconstruction	—Unverified
Geometric Constraints in Deep Learning Frameworks: A Survey	Mar 19, 2024	Deep LearningDepth Estimation	—Unverified
GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization	Jan 23, 2025	3DGSAutonomous Driving	—Unverified
Glass Segmentation Using Intensity and Spectral Polarization Cues	Jan 1, 2022	Camouflaged Object SegmentationScene Understanding	—Unverified
Global Context Aware Convolutions for 3D Point Cloud Understanding	Aug 7, 2020	Point Cloud ClassificationRetrieval	—Unverified
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane	May 27, 2024	3DGSfeature selection	—Unverified
Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models	Jan 1, 2024	Scene Understanding	—Unverified
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding	Nov 20, 2023	Instance SegmentationNeRF	—Unverified
GPT-4V Explorations: Mining Autonomous Driving	Jun 24, 2024	Autonomous DrivingDecision Making	—Unverified
GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior Prediction	Nov 24, 2023	Autonomous DrivingAutonomous Vehicles	—Unverified
GP-VLS: A general-purpose vision language model for surgery	Jul 27, 2024	Language ModelingLanguage Modelling	—Unverified
Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving	Nov 6, 2024	Autonomous DrivingMulti-Object Tracking	—Unverified
Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection	Apr 25, 2022	3D Object DetectionGraph structure learning	—Unverified
Graph-Grounded LLMs: Leveraging Graphical Function Calling to Minimize LLM Hallucinations	Mar 13, 2025	Autonomous VehiclesKnowledge Graphs	—Unverified
Grounded Objects and Interactions for Video Captioning	Nov 16, 2017	ObjectScene Understanding	—Unverified
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model	Jan 1, 2025	AttributeLanguage Modeling	—Unverified
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding	Mar 6, 2024	NeRFScene Understanding	—Unverified
HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering	Apr 18, 2025	ClusteringGraph Clustering	—Unverified
Hallucinated Humans as the Hidden Context for Labeling 3D Scenes	Jun 1, 2013	AttributeObject	—Unverified
HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning	May 21, 2025	Autonomous DrivingMamba	—Unverified
Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images	Aug 27, 2024	Organ SegmentationScene Segmentation	—Unverified
HAtt-Flow: Hierarchical Attention-Flow Mechanism for Group Activity Scene Graph Generation in Videos	Nov 28, 2023	Graph GenerationScene Graph Generation	—Unverified
Transavs: End-To-End Audio-Visual Segmentation With Transformer	May 12, 2023	Scene UnderstandingSegmentation	—Unverified
HeLiMOS: A Dataset for Moving Object Segmentation in 3D Point Clouds From Heterogeneous LiDAR Sensors	Aug 12, 2024	Scene UnderstandingSemantic Segmentation	—Unverified
Heterogeneous Visual Features Fusion via Sparse Multimodal Machine	Jun 1, 2013	Feature Importanceimage-classification	—Unverified
HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes	Mar 29, 2024	3DGSAutonomous Vehicles	—Unverified
Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions	Sep 27, 2017	DescriptiveObject	—Unverified
Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction	Mar 9, 2019	DenoisingObject	—Unverified
High-Accuracy Facial Depth Models derived from 3D Synthetic Data	Mar 26, 2020	3D ReconstructionDepth Estimation	—Unverified
Highway Driving Dataset for Semantic Video Segmentation	Nov 2, 2020	Autonomous DrivingImage Segmentation	—Unverified
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding	Mar 17, 2025	Question AnsweringScene Understanding	—Unverified
HOIverse: A Synthetic Scene Graph Dataset With Human Object Interactions	Jun 24, 2025	Graph GenerationHuman-Object Interaction Detection	—Unverified

Show:10 25 50

← PrevPage 21 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified