Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–650 of 1723 papers

Title	Date	Tasks	Status	Score
DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks	Mar 9, 2017	Scene Understanding	CodeCode Available	5
Improving Social Awareness Through DANTE: A Deep Affinity Network for Clustering Conversational Interactants	Jul 24, 2019	ClusteringGraph Clustering	CodeCode Available	5
Auxiliary Tasks in Multi-task Learning	May 16, 2018	Depth EstimationMulti-Task Learning	CodeCode Available	5
On the iterative refinement of densely connected representation levels for semantic segmentation	Apr 30, 2018	Image SegmentationScene Understanding	CodeCode Available	5
DADA: Driver Attention Prediction in Driving Accident Scenarios	Dec 18, 2019	Driver Attention MonitoringPrediction	CodeCode Available	5
One model to use them all: Training a segmentation model with complementary datasets	Feb 29, 2024	AllAnatomy	CodeCode Available	5
Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs	May 15, 2023	RelationScene Graph Generation	CodeCode Available	5
CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation	Jan 16, 2025	Novel View SynthesisScene Understanding	CodeCode Available	5
Hierarchical Superpixel Segmentation via Structural Information Theory	Jan 13, 2025	graph constructiongraph partitioning	CodeCode Available	5
Object Attribute Matters in Visual Question Answering	Dec 20, 2023	AttributeGraph Neural Network	CodeCode Available	5
Hierarchical Context Transformer for Multi-level Semantic Scene Understanding	Feb 21, 2025	Contrastive LearningRepresentation Learning	CodeCode Available	5
Auto-Embedding Generative Adversarial Networks for High Resolution Image Synthesis	Mar 27, 2019	Generative Adversarial NetworkImage Generation	CodeCode Available	5
Object-aware Sound Source Localization via Audio-Visual Scene Understanding	Jan 1, 2025	Scene UnderstandingSound Source Localization	CodeCode Available	5
Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields	Mar 17, 2024	3D ReconstructionNeRF	CodeCode Available	5
On the Structures of Representation for the Robustness of Semantic Segmentation to Input Corruption	Sep 2, 2020	Scene UnderstandingSegmentation	CodeCode Available	5
P2AT: Pyramid Pooling Axial Transformer for Real-time Semantic Segmentation	Oct 23, 2023	Autonomous DrivingDecoder	CodeCode Available	5
Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting	Jul 6, 2021	3D Object DetectionAutonomous Driving	CodeCode Available	5
Neural Radiance Field Codebooks	Jan 10, 2023	ObjectRepresentation Learning	CodeCode Available	5
Multi-task Planar Reconstruction with Feature Warping Guidance	Nov 25, 2023	3D ReconstructionInstance Segmentation	CodeCode Available	5
Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images	Nov 4, 2024	Multi-Task LearningScene Understanding	CodeCode Available	5
Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera	Jan 9, 2019	3D Reconstruction3D Scene Reconstruction	CodeCode Available	5
ShelfNet for Fast Semantic Segmentation	Nov 27, 2018	Autonomous DrivingDecoder	CodeCode Available	5
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation	Oct 31, 2018	3D Object DetectionCamera Pose Estimation	CodeCode Available	5
Multi-Resolution Multi-Modal Sensor Fusion For Remote Sensing Data With Label Uncertainty	May 2, 2018	Scene UnderstandingSensor Fusion	CodeCode Available	5
Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans	Dec 1, 2021	4D Panoptic SegmentationAutonomous Navigation	CodeCode Available	5
Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation	Mar 3, 2021	Autonomous DrivingDepth Estimation	CodeCode Available	5
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents	Nov 27, 2024	Autonomous NavigationObject Recognition	CodeCode Available	5
Continual Learning of Unsupervised Monocular Depth from Videos	Nov 4, 2023	Autonomous DrivingContinual Learning	CodeCode Available	5
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification	Jul 25, 2019	Autonomous VehiclesClassification	CodeCode Available	5
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization	Nov 26, 2018	2D Object Detection3D Object Detection	CodeCode Available	5
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available	5
Constructing a Visual Relationship Authenticity Dataset	Oct 11, 2020	Relationship DetectionScene Understanding	CodeCode Available	5
Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud	Mar 23, 2019	3D Object DetectionDepth Estimation	CodeCode Available	5
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Apr 9, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available	5
NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data	Jan 8, 2025	Autonomous DrivingInstance Segmentation	CodeCode Available	5
Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene Understanding	Dec 22, 2022	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	CodeCode Available	5
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Dec 27, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available	5
MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities	Aug 14, 2020	Representation LearningScene Understanding	CodeCode Available	5
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation	Nov 16, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available	5
MGNiceNet: Unified Monocular Geometric Scene Understanding	Nov 18, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available	5
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange	Apr 11, 2024	ObjectScene Understanding	CodeCode Available	5
Computational Imaging for Machine Perception: Transferring Semantic Segmentation beyond Aberrations	Nov 21, 2022	Domain AdaptationScene Understanding	CodeCode Available	5
MC-PanDA: Mask Confidence for Panoptic Domain Adaptation	Jul 19, 2024	Domain AdaptationPanoptic Segmentation	CodeCode Available	5
General-Purpose Deep Point Cloud Feature Extractor	Mar 12, 2018	3D Object Classification3D Point Cloud Classification	CodeCode Available	5
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models	Mar 28, 2016	Scene Understanding	CodeCode Available	5
Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation	Mar 18, 2024	Common Sense ReasoningEfficient Exploration	CodeCode Available	5
Matterport3D: Learning from RGB-D Data in Indoor Environments	Sep 18, 2017	General ClassificationScene Understanding	CodeCode Available	5
Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis	Jun 28, 2023	Scene Understanding	CodeCode Available	5
METEOR Guided Divergence for Video Captioning	Dec 20, 2022	Hierarchical Reinforcement LearningScene Understanding	CodeCode Available	5
Model-based inexact graph matching on top of CNNs for semantic scene understanding	Jan 18, 2023	Brain SegmentationDeep Learning	CodeCode Available	5

Show:10 25 50

← PrevPage 13 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified