Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 626–650 of 1723 papers

Title	Date	Tasks	Status	Score
Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation	Mar 3, 2021	Autonomous DrivingDepth Estimation	CodeCode Available	5
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents	Nov 27, 2024	Autonomous NavigationObject Recognition	CodeCode Available	5
Continual Learning of Unsupervised Monocular Depth from Videos	Nov 4, 2023	Autonomous DrivingContinual Learning	CodeCode Available	5
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification	Jul 25, 2019	Autonomous VehiclesClassification	CodeCode Available	5
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization	Nov 26, 2018	2D Object Detection3D Object Detection	CodeCode Available	5
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available	5
Constructing a Visual Relationship Authenticity Dataset	Oct 11, 2020	Relationship DetectionScene Understanding	CodeCode Available	5
Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud	Mar 23, 2019	3D Object DetectionDepth Estimation	CodeCode Available	5
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Apr 9, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available	5
NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data	Jan 8, 2025	Autonomous DrivingInstance Segmentation	CodeCode Available	5
Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene Understanding	Dec 22, 2022	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	CodeCode Available	5
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Dec 27, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available	5
MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities	Aug 14, 2020	Representation LearningScene Understanding	CodeCode Available	5
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation	Nov 16, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available	5
MGNiceNet: Unified Monocular Geometric Scene Understanding	Nov 18, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available	5
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange	Apr 11, 2024	ObjectScene Understanding	CodeCode Available	5
Computational Imaging for Machine Perception: Transferring Semantic Segmentation beyond Aberrations	Nov 21, 2022	Domain AdaptationScene Understanding	CodeCode Available	5
MC-PanDA: Mask Confidence for Panoptic Domain Adaptation	Jul 19, 2024	Domain AdaptationPanoptic Segmentation	CodeCode Available	5
General-Purpose Deep Point Cloud Feature Extractor	Mar 12, 2018	3D Object Classification3D Point Cloud Classification	CodeCode Available	5
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models	Mar 28, 2016	Scene Understanding	CodeCode Available	5
Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation	Mar 18, 2024	Common Sense ReasoningEfficient Exploration	CodeCode Available	5
Matterport3D: Learning from RGB-D Data in Indoor Environments	Sep 18, 2017	General ClassificationScene Understanding	CodeCode Available	5
Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis	Jun 28, 2023	Scene Understanding	CodeCode Available	5
METEOR Guided Divergence for Video Captioning	Dec 20, 2022	Hierarchical Reinforcement LearningScene Understanding	CodeCode Available	5
Model-based inexact graph matching on top of CNNs for semantic scene understanding	Jan 18, 2023	Brain SegmentationDeep Learning	CodeCode Available	5

Show:10 25 50

← PrevPage 26 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified