Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–750 of 1723 papers

Title	Date	Tasks	Status	Score
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding	Jul 28, 2024	Contrastive LearningIntention-oriented Segmentation	CodeCode Available	5
LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics	Apr 16, 2018	NavigateScene Understanding	CodeCode Available	5
Lightweight integration of 3D features to improve 2D image segmentation	Dec 16, 2022	Image SegmentationScene Understanding	CodeCode Available	5
Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and MICCAI FetReg2021 Challenge Findings	Jun 24, 2022	Scene UnderstandingSemantic Segmentation	CodeCode Available	5
Leveraging Acoustic Images for Effective Self-Supervised Audio Representation Learning	Aug 1, 2020	Cross-Modal RetrievalRepresentation Learning	CodeCode Available	5
Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding	Apr 18, 2025	Deep LearningPoint Cloud Completion	CodeCode Available	5
Fast Scene Understanding for Autonomous Driving	Aug 8, 2017	Autonomous DrivingDecoder	CodeCode Available	5
Artificial Color Constancy via GoogLeNet with Angular Loss Function	Nov 20, 2018	Color ConstancyObject Recognition	CodeCode Available	5
Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation	Apr 12, 2018	Optical Flow EstimationScene Flow Estimation	CodeCode Available	5
Learning Panoptic Segmentation from Instance Contours	Oct 16, 2020	ClusteringInstance Segmentation	CodeCode Available	5
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions	Sep 19, 2024	Audio captioningLanguage Modeling	CodeCode Available	5
False Negative Reduction in Video Instance Segmentation using Uncertainty Estimates	Jun 28, 2021	Depth EstimationInstance Segmentation	CodeCode Available	5
Implicit Background Estimation for Semantic Segmentation	May 23, 2019	Scene UnderstandingSegmentation	CodeCode Available	5
Learning Regional Purity for Instance Segmentation on 3D Point Clouds	Nov 3, 2020	3D Instance Segmentation3D Semantic Segmentation	CodeCode Available	5
Learning Monocular Depth by Distilling Cross-domain Stereo Networks	Aug 20, 2018	Autonomous DrivingDepth Estimation	CodeCode Available	5
Facing the Void: Overcoming Missing Data in Multi-View Imagery	May 21, 2022	Classificationimage-classification	CodeCode Available	5
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors	May 30, 2025	3D geometryLarge Language Model	CodeCode Available	5
Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild	Aug 25, 2024	Contrastive LearningFine-Grained Image Classification	CodeCode Available	5
Adversarial Attacks on Monocular Pose Estimation	Jul 14, 2022	Depth EstimationMonocular Depth Estimation	CodeCode Available	5
Language-based Colorization of Scene Sketches	Nov 17, 2019	ColorizationImage Generation	CodeCode Available	5
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation	Aug 21, 2024	3D Semantic SegmentationData Augmentation	CodeCode Available	5
Knowledge-Guided Object Discovery with Acquired Deep Impressions	Mar 19, 2021	ObjectObject Discovery	CodeCode Available	5
Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning	Sep 16, 2021	DecoderImage Captioning	CodeCode Available	5
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition	Nov 27, 2024	Action RecognitionGraph Attention	CodeCode Available	5
Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud	Mar 23, 2019	3D Object DetectionDepth Estimation	CodeCode Available	5
Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer	Apr 3, 2019	Deep Reinforcement LearningReinforcement Learning	CodeCode Available	5
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation	May 4, 2025	BenchmarkingFeature Upsampling	CodeCode Available	5
Interpretable Visual Understanding with Cognitive Attention Network	Aug 6, 2021	Scene UnderstandingVisual Commonsense Reasoning	CodeCode Available	5
P2AT: Pyramid Pooling Axial Transformer for Real-time Semantic Segmentation	Oct 23, 2023	Autonomous DrivingDecoder	CodeCode Available	5
Single Image 3D Object Estimation with Primitive Graph Networks	Sep 9, 2021	Graph Neural NetworkObject	CodeCode Available	5
Exploiting Temporal Coherence for Multi-modal Video Categorization	Feb 7, 2020	object-detectionObject Detection	—Unverified	0
Exploiting High Level Scene Cues in Stereo Reconstruction	Dec 1, 2015	3D ReconstructionScene Understanding	—Unverified	0
Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection	Feb 13, 2023	3D Object DetectionGraph Generation	—Unverified	0
Challenges for Monocular 6D Object Pose Estimation in Robotics	Jul 22, 2023	6D Pose Estimation using RGBObject	—Unverified	0
ArK: Augmented Reality with Knowledge Interactive Emergent Ability	May 1, 2023	AI AgentMixed Reality	—Unverified	0
Explainable Scene Understanding with Qualitative Representations and Graph Neural Networks	Apr 17, 2025	Autonomous DrivingScene Understanding	—Unverified	0
Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception	Aug 31, 2023	Activity RecognitionHuman Activity Recognition	—Unverified	0
Exosense: A Vision-Based Scene Understanding System For Exoskeletons	Mar 21, 2024	Language ModellingMotion Planning	—Unverified	0
Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models	Jul 17, 2025	3D Point Cloud ReconstructionPoint cloud reconstruction	—Unverified	0
Adversarial Attacks on Monocular Depth Estimation	Mar 23, 2020	Autonomous DrivingDepth Estimation	—Unverified	0
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail	Mar 21, 2025	ObjectScene Understanding	—Unverified	0
EvSegSNN: Neuromorphic Semantic Segmentation for Event Data	Jun 20, 2024	Autonomous VehiclesDecoder	—Unverified	0
EvidMTL: Evidential Multi-Task Learning for Uncertainty-Aware Semantic Surface Mapping from Monocular RGB Images	Mar 6, 2025	Depth EstimationDepth Prediction	—Unverified	0
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond	Mar 3, 2025	Infrared And Visible Image FusionScene Understanding	—Unverified	0
Event fields: Capturing light fields at high speed, resolution, and dynamic range	Dec 9, 2024	Depth EstimationScene Understanding	—Unverified	0
Category-Level and Open-Set Object Pose Estimation for Robotics	Apr 28, 2025	6D Pose Estimation6D Pose Estimation using RGB	—Unverified	0
Evaluation of Multimodal Semantic Segmentation using RGB-D Data	Mar 31, 2021	Scene UnderstandingSemantic Segmentation	—Unverified	0
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL)	Feb 5, 2022	object-detectionObject Detection	—Unverified	0
A Review on Visual-SLAM: Advancements from Geometric Modelling to Learning-based Semantic Scene Understanding	Sep 12, 2022	Scene Understanding	—Unverified	0
Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets	Jan 7, 2025	Data Augmentationparameter estimation	—Unverified	0

Show:10 25 50

← PrevPage 15 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified