Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–700 of 1723 papers

Title	Date	Tasks	Status	Score
Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene Understanding	Dec 22, 2022	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	CodeCode Available	5
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification	Jul 25, 2019	Autonomous VehiclesClassification	CodeCode Available	5
Computational Imaging for Machine Perception: Transferring Semantic Segmentation beyond Aberrations	Nov 21, 2022	Domain AdaptationScene Understanding	CodeCode Available	5
General-Purpose Deep Point Cloud Feature Extractor	Mar 12, 2018	3D Object Classification3D Point Cloud Classification	CodeCode Available	5
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models	Mar 28, 2016	Scene Understanding	CodeCode Available	5
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization	Nov 26, 2018	2D Object Detection3D Object Detection	CodeCode Available	5
Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis	Jun 28, 2023	Scene Understanding	CodeCode Available	5
Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud	Mar 23, 2019	3D Object DetectionDepth Estimation	CodeCode Available	5
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Apr 9, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available	5
Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation	Mar 3, 2021	Autonomous DrivingDepth Estimation	CodeCode Available	5
Model-based inexact graph matching on top of CNNs for semantic scene understanding	Jan 18, 2023	Brain SegmentationDeep Learning	CodeCode Available	5
Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations	Dec 1, 2019	Scene Understanding	CodeCode Available	5
Gated Driver Attention Predictor	Aug 1, 2023	Driver Attention MonitoringPrediction	CodeCode Available	5
Gated2Depth: Real-time Dense Lidar from Gated Images	Feb 13, 2019	Scene Understanding	CodeCode Available	5
MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities	Aug 14, 2020	Representation LearningScene Understanding	CodeCode Available	5
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Dec 27, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available	5
GaIA: Graphical Information Gain based Attention Network for Weakly Supervised Point Cloud Semantic Segmentation	Oct 2, 2022	Scene UnderstandingSegmentation	CodeCode Available	5
MGNiceNet: Unified Monocular Geometric Scene Understanding	Nov 18, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available	5
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange	Apr 11, 2024	ObjectScene Understanding	CodeCode Available	5
METEOR Guided Divergence for Video Captioning	Dec 20, 2022	Hierarchical Reinforcement LearningScene Understanding	CodeCode Available	5
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory	Jul 4, 2021	Question AnsweringScene Understanding	CodeCode Available	5
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation	Nov 16, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available	5
On the Structures of Representation for the Robustness of Semantic Segmentation to Input Corruption	Sep 2, 2020	Scene UnderstandingSegmentation	CodeCode Available	5
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild	Jan 8, 2024	Language ModellingLarge Language Model	CodeCode Available	5
m2caiSeg: Semantic Segmentation of Laparoscopic Images using Convolutional Neural Networks	Aug 23, 2020	AnatomyData Augmentation	CodeCode Available	5
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images	Jan 26, 2016	DiversityGeneral Classification	CodeCode Available	5
From Node to Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot Detection	Feb 15, 2022	Generalized Zero-Shot Object DetectionScene Understanding	CodeCode Available	5
Loss Distillation via Gradient Matching for Point Cloud Completion with Weighted Chamfer Distance	Sep 10, 2024	Bilevel OptimizationPoint Cloud Completion	CodeCode Available	5
From Feature Importance to Natural Language Explanations Using LLMs with RAG	Jul 30, 2024	counterfactualCounterfactual Reasoning	CodeCode Available	5
CNN-based Lidar Point Cloud De-Noising in Adverse Weather	Dec 9, 2019	Autonomous VehiclesScene Understanding	CodeCode Available	5
Loss Switching Fusion with Similarity Search for Video Classification	Jun 27, 2019	ClassificationClustering	CodeCode Available	5
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition	Nov 27, 2024	Action RecognitionGraph Attention	CodeCode Available	5
LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics	Apr 16, 2018	NavigateScene Understanding	CodeCode Available	5
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding	Apr 4, 2023	Autonomous DrivingDomain Adaptation	CodeCode Available	5
Lightweight integration of 3D features to improve 2D image segmentation	Dec 16, 2022	Image SegmentationScene Understanding	CodeCode Available	5
Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video	May 27, 2019	Inductive BiasModel Predictive Control	CodeCode Available	5
Leveraging Acoustic Images for Effective Self-Supervised Audio Representation Learning	Aug 1, 2020	Cross-Modal RetrievalRepresentation Learning	CodeCode Available	5
Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding	Apr 18, 2025	Deep LearningPoint Cloud Completion	CodeCode Available	5
FlowGrad: Using Motion for Visual Sound Source Localization	Nov 15, 2022	Optical Flow EstimationScene Understanding	CodeCode Available	5
Flow-based GAN for 3D Point Cloud Generation from a Single Image	Oct 8, 2022	Point Cloud GenerationScene Understanding	CodeCode Available	5
Aerial Scene Understanding in The Wild: Multi-Scene Recognition via Prototype-based Memory Networks	Apr 22, 2021	RetrievalScene Recognition	CodeCode Available	5
Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation	Apr 12, 2018	Optical Flow EstimationScene Flow Estimation	CodeCode Available	5
Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient Scene Graph Generation	May 30, 2023	Graph GenerationImage Generation	CodeCode Available	5
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding	Jul 28, 2024	Contrastive LearningIntention-oriented Segmentation	CodeCode Available	5
Matterport3D: Learning from RGB-D Data in Indoor Environments	Sep 18, 2017	General ClassificationScene Understanding	CodeCode Available	5
Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and MICCAI FetReg2021 Challenge Findings	Jun 24, 2022	Scene UnderstandingSemantic Segmentation	CodeCode Available	5
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors	May 30, 2025	3D geometryLarge Language Model	CodeCode Available	5
Learning Monocular Depth by Distilling Cross-domain Stereo Networks	Aug 20, 2018	Autonomous DrivingDepth Estimation	CodeCode Available	5
Learning Panoptic Segmentation from Instance Contours	Oct 16, 2020	ClusteringInstance Segmentation	CodeCode Available	5
Language-based Colorization of Scene Sketches	Nov 17, 2019	ColorizationImage Generation	CodeCode Available	5

Show:10 25 50

← PrevPage 14 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified