Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1576–1600 of 1723 papers

Title	Date	Tasks	Status
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available
Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations	Dec 1, 2019	Scene Understanding	CodeCode Available
General-Purpose Deep Point Cloud Feature Extractor	Mar 12, 2018	3D Object Classification3D Point Cloud Classification	CodeCode Available
Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis	Jun 28, 2023	Scene Understanding	CodeCode Available
APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds	May 15, 2025	Point Cloud SegmentationScene Understanding	CodeCode Available
Gated Driver Attention Predictor	Aug 1, 2023	Driver Attention MonitoringPrediction	CodeCode Available
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio	Oct 1, 2024	Scene UnderstandingSound Source Localization	CodeCode Available
Model-based inexact graph matching on top of CNNs for semantic scene understanding	Jan 18, 2023	Brain SegmentationDeep Learning	CodeCode Available
Gated2Depth: Real-time Dense Lidar from Gated Images	Feb 13, 2019	Scene Understanding	CodeCode Available
GaIA: Graphical Information Gain based Attention Network for Weakly Supervised Point Cloud Semantic Segmentation	Oct 2, 2022	Scene UnderstandingSegmentation	CodeCode Available
MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities	Aug 14, 2020	Representation LearningScene Understanding	CodeCode Available
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild	Jan 8, 2024	Language ModellingLarge Language Model	CodeCode Available
Rotation Invariant Convolutions for 3D Point Clouds Deep Learning	Aug 17, 2019	Deep LearningScene Understanding	CodeCode Available
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Dec 27, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange	Apr 11, 2024	ObjectScene Understanding	CodeCode Available
DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks	Mar 9, 2017	Scene Understanding	CodeCode Available
MGNiceNet: Unified Monocular Geometric Scene Understanding	Nov 18, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation	Nov 16, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available
Collaborative Propagation on Multiple Instance Graphs for 3D Instance Segmentation with Single-point Supervision	Aug 10, 2022	3D Instance SegmentationInstance Segmentation	CodeCode Available
Improving Social Awareness Through DANTE: A Deep Affinity Network for Clustering Conversational Interactants	Jul 24, 2019	ClusteringGraph Clustering	CodeCode Available
DADA: Driver Attention Prediction in Driving Accident Scenarios	Dec 18, 2019	Driver Attention MonitoringPrediction	CodeCode Available
Structure-Aware Residual Pyramid Network for Monocular Depth Estimation	Jul 13, 2019	DecoderDepth Estimation	CodeCode Available
METEOR Guided Divergence for Video Captioning	Dec 20, 2022	Hierarchical Reinforcement LearningScene Understanding	CodeCode Available
MC-PanDA: Mask Confidence for Panoptic Domain Adaptation	Jul 19, 2024	Domain AdaptationPanoptic Segmentation	CodeCode Available
Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs	May 15, 2023	RelationScene Graph Generation	CodeCode Available

Show:10 25 50

← PrevPage 64 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified