Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1601–1650 of 1723 papers

Title	Date	Tasks	Status
Matterport3D: Learning from RGB-D Data in Indoor Environments	Sep 18, 2017	General ClassificationScene Understanding	CodeCode Available
From Node to Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot Detection	Feb 15, 2022	Generalized Zero-Shot Object DetectionScene Understanding	CodeCode Available
CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation	Jan 16, 2025	Novel View SynthesisScene Understanding	CodeCode Available
Structured Label Inference for Visual Understanding	Feb 18, 2018	Action DetectionGeneral Classification	CodeCode Available
AVS-Net: Point Sampling with Adaptive Voxel Size for 3D Scene Understanding	Feb 27, 2024	3D Object Detection3D Part Segmentation	CodeCode Available
From Feature Importance to Natural Language Explanations Using LLMs with RAG	Jul 30, 2024	counterfactualCounterfactual Reasoning	CodeCode Available
m2caiSeg: Semantic Segmentation of Laparoscopic Images using Convolutional Neural Networks	Aug 23, 2020	AnatomyData Augmentation	CodeCode Available
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation	Oct 31, 2018	3D Object DetectionCamera Pose Estimation	CodeCode Available
Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans	Dec 1, 2021	4D Panoptic SegmentationAutonomous Navigation	CodeCode Available
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding	Apr 4, 2023	Autonomous DrivingDomain Adaptation	CodeCode Available
LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics	Apr 16, 2018	NavigateScene Understanding	CodeCode Available
Loss Switching Fusion with Similarity Search for Video Classification	Jun 27, 2019	ClassificationClustering	CodeCode Available
Loss Distillation via Gradient Matching for Point Cloud Completion with Weighted Chamfer Distance	Sep 10, 2024	Bilevel OptimizationPoint Cloud Completion	CodeCode Available
AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning	Jan 1, 2025	Audio-visual Question AnsweringContinual Learning	CodeCode Available
Continual Learning of Unsupervised Monocular Depth from Videos	Nov 4, 2023	Autonomous DrivingContinual Learning	CodeCode Available
FlowGrad: Using Motion for Visual Sound Source Localization	Nov 15, 2022	Optical Flow EstimationScene Understanding	CodeCode Available
An Information-Theoretic Metric of Transferability for Task Transfer Learning	May 1, 2019	General ClassificationScene Understanding	CodeCode Available
SceneAware: Scene-Constrained Pedestrian Trajectory Prediction with LLM-Guided Walkability	Jun 17, 2025	Pedestrian Trajectory PredictionScene Understanding	CodeCode Available
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition	Nov 27, 2024	Action RecognitionGraph Attention	CodeCode Available
Lightweight integration of 3D features to improve 2D image segmentation	Dec 16, 2022	Image SegmentationScene Understanding	CodeCode Available
Surgical Scene Segmentation by Transformer With Asymmetric Feature Enhancement	Oct 23, 2024	AnatomyScene Segmentation	CodeCode Available
Constructing a Visual Relationship Authenticity Dataset	Oct 11, 2020	Relationship DetectionScene Understanding	CodeCode Available
Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene Understanding	Dec 22, 2022	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	CodeCode Available
Computational Imaging for Machine Perception: Transferring Semantic Segmentation beyond Aberrations	Nov 21, 2022	Domain AdaptationScene Understanding	CodeCode Available
Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding	Apr 18, 2025	Deep LearningPoint Cloud Completion	CodeCode Available
Flow-based GAN for 3D Point Cloud Generation from a Single Image	Oct 8, 2022	Point Cloud GenerationScene Understanding	CodeCode Available
Scene Graph Generation from Objects, Phrases and Region Captions	Jul 31, 2017	Graph Generationobject-detection	CodeCode Available
Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient Scene Graph Generation	May 30, 2023	Graph GenerationImage Generation	CodeCode Available
Auxiliary Tasks in Multi-task Learning	May 16, 2018	Depth EstimationMulti-Task Learning	CodeCode Available
Auto-Embedding Generative Adversarial Networks for High Resolution Image Synthesis	Mar 27, 2019	Generative Adversarial NetworkImage Generation	CodeCode Available
Implicit Background Estimation for Semantic Segmentation	May 23, 2019	Scene UnderstandingSegmentation	CodeCode Available
SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth	Dec 15, 2016	3D ReconstructionCamera Pose Estimation	CodeCode Available
Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and MICCAI FetReg2021 Challenge Findings	Jun 24, 2022	Scene UnderstandingSemantic Segmentation	CodeCode Available
SceneNet: Understanding Real World Indoor Scenes With Synthetic Data	Nov 22, 2015	Scene Understanding	CodeCode Available
Fast Scene Understanding for Autonomous Driving	Aug 8, 2017	Autonomous DrivingDecoder	CodeCode Available
Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search	Jul 10, 2024	Few-Shot LearningGPU	CodeCode Available
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory	Jul 4, 2021	Question AnsweringScene Understanding	CodeCode Available
Leveraging Acoustic Images for Effective Self-Supervised Audio Representation Learning	Aug 1, 2020	Cross-Modal RetrievalRepresentation Learning	CodeCode Available
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models	Mar 28, 2016	Scene Understanding	CodeCode Available
A New Lightweight Hybrid Graph Convolutional Neural Network -- CNN Scheme for Scene Classification using Object Detection Inference	Jul 19, 2024	Autonomous Vehiclesobject-detection	CodeCode Available
False Negative Reduction in Video Instance Segmentation using Uncertainty Estimates	Jun 28, 2021	Depth EstimationInstance Segmentation	CodeCode Available
3D Semantic Segmentation of Modular Furniture using rjMCMC	May 15, 2017	3D Semantic Segmentationfurniture segmentation	CodeCode Available
Uncertainty-aware LiDAR Panoptic Segmentation	Oct 10, 2022	Autonomous DrivingPanoptic Segmentation	CodeCode Available
Facing the Void: Overcoming Missing Data in Multi-View Imagery	May 21, 2022	Classificationimage-classification	CodeCode Available
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images	Jan 26, 2016	DiversityGeneral Classification	CodeCode Available
CNN-based Lidar Point Cloud De-Noising in Adverse Weather	Dec 9, 2019	Autonomous VehiclesScene Understanding	CodeCode Available
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding	Aug 30, 2024	Language ModellingLarge Language Model	CodeCode Available
An efficient solution for semantic segmentation: ShuffleNet V2 with atrous separable convolutions	Feb 20, 2019	Autonomous DrivingScene Understanding	CodeCode Available
SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding	Jun 21, 2022	ClusteringObject Discovery	CodeCode Available
Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild	Aug 25, 2024	Contrastive LearningFine-Grained Image Classification	CodeCode Available

Show:10 25 50

← PrevPage 33 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified