Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1676–1700 of 1723 papers

Title	Date	Tasks	Status
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Apr 20, 2025	Autonomous DrivingImage Captioning	CodeCode Available
Efficient Computation Sharing for Multi-Task Visual Scene Understanding	Mar 16, 2023	Multi-Task LearningScene Understanding	CodeCode Available
DualMLP: a two-stream fusion model for 3D point cloud classification	Oct 10, 2023	3D Point Cloud ClassificationPoint Cloud Classification	CodeCode Available
Road Scene Understanding by Occupancy Grid Learning from Sparse Radar Clusters using Semantic Segmentation	Mar 31, 2019	Autonomous Drivingroad scene understanding	CodeCode Available
Self-Supervised Partial Cycle-Consistency for Multi-View Matching	Jan 10, 2025	Scene Understanding	CodeCode Available
Learning Monocular Depth by Distilling Cross-domain Stereo Networks	Aug 20, 2018	Autonomous DrivingDepth Estimation	CodeCode Available
Boundary-Seeking Generative Adversarial Networks	Feb 27, 2017	Scene UnderstandingText Generation	CodeCode Available
Dual-Glance Model for Deciphering Social Relationships	Aug 2, 2017	modelobject-detection	CodeCode Available
Self-Supervised Road Layout Parsing with Graph Auto-Encoding	Mar 21, 2022	Image ReconstructionScene Understanding	CodeCode Available
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors	May 30, 2025	3D geometryLarge Language Model	CodeCode Available
Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects	May 31, 2023	3D Pose EstimationContrastive Learning	CodeCode Available
Zoom in on the Plant: Fine-grained Analysis of Leaf, Stem and Vein Instances	Dec 14, 2023	Scene Understanding	CodeCode Available
Language-based Colorization of Scene Sketches	Nov 17, 2019	ColorizationImage Generation	CodeCode Available
Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning	Sep 16, 2021	DecoderImage Captioning	CodeCode Available
Adversarial Attacks on Monocular Pose Estimation	Jul 14, 2022	Depth EstimationMonocular Depth Estimation	CodeCode Available
Visually Grounded VQA by Lattice-based Retrieval	Nov 15, 2022	Information RetrievalQuestion Answering	CodeCode Available
The ADUULM-360 Dataset -- A Multi-Modal Dataset for Depth Estimation in Adverse Weather	Nov 18, 2024	Autonomous DrivingDepth Estimation	CodeCode Available
DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection	May 14, 2025	object-detectionObject Detection	CodeCode Available
Doubly Contrastive End-to-End Semantic Segmentation for Autonomous Driving under Adverse Weather	Nov 21, 2022	Autonomous DrivingGPU	CodeCode Available
A Review on Deep Learning Techniques Applied to Semantic Segmentation	Apr 22, 2017	Autonomous DrivingDeep Learning	CodeCode Available
Semantic Foreground Inpainting from Weak Supervision	Sep 10, 2019	Scene UnderstandingSemantic Segmentation	CodeCode Available
BOLD5000: A public fMRI dataset of 5000 images	Sep 5, 2018	DiversityScene Understanding	CodeCode Available
DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding	Mar 25, 2024	DecoderObject	CodeCode Available
UniNet: A Unified Scene Understanding Network and Exploring Multi-Task Relationships through the Lens of Adversarial Attacks	Aug 10, 2021	Depth EstimationDepth Prediction	CodeCode Available
Knowledge-Guided Object Discovery with Acquired Deep Impressions	Mar 19, 2021	ObjectObject Discovery	CodeCode Available

Show:10 25 50

← PrevPage 68 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified