Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1651–1700 of 1723 papers

Title	Date	Tasks	Status
SDOF-Tracker: Fast and Accurate Multiple Human Tracking by Skipped-Detection and Optical-Flow	Jun 27, 2021	Human DetectionOptical Flow Estimation	CodeCode Available
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation	Aug 21, 2024	3D Semantic SegmentationData Augmentation	CodeCode Available
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions	Sep 19, 2024	Audio captioningLanguage Modeling	CodeCode Available
Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation	Apr 12, 2018	Optical Flow EstimationScene Flow Estimation	CodeCode Available
Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment	Jun 12, 2024	3D ReconstructionScene Understanding	CodeCode Available
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios	May 21, 2023	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available
Aerial Scene Understanding in The Wild: Multi-Scene Recognition via Prototype-based Memory Networks	Apr 22, 2021	RetrievalScene Recognition	CodeCode Available
Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene Understanding	Nov 28, 2022	Contrastive LearningDecision Making	CodeCode Available
Evaluating Compositional Scene Understanding in Multimodal Generative Models	Mar 29, 2025	Scene Understanding	CodeCode Available
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning	Mar 5, 2023	Answer GenerationEntity Alignment	CodeCode Available
ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation	Oct 9, 2017	GPUReal-Time Semantic Segmentation	CodeCode Available
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding	Jul 28, 2024	Contrastive LearningIntention-oriented Segmentation	CodeCode Available
SeGAN: Segmenting and Generating the Invisible	Mar 29, 2017	Depth EstimationScene Understanding	CodeCode Available
Artificial Color Constancy via GoogLeNet with Angular Loss Function	Nov 20, 2018	Color ConstancyObject Recognition	CodeCode Available
Adaptive Visual Scene Understanding: Incremental Scene Graph Generation	Oct 2, 2023	BenchmarkingContinual Learning	CodeCode Available
Temporally Consistent Horizon Lines	Jul 23, 2019	3D ReconstructionAutonomous Vehicles	CodeCode Available
CARL-D: A vision benchmark suite and large scale dataset for vehicle detection and scene segmentation	Feb 17, 2022	2D Object DetectionAutonomous Driving	CodeCode Available
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions	Feb 13, 2018	BIG-bench Machine LearningManagement	CodeCode Available
Efficient ConvNet for Real-time Semantic Segmentation	Jun 1, 2017	GPUReal-Time Semantic Segmentation	CodeCode Available
Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence	May 22, 2019	Optical Flow EstimationScene Understanding	CodeCode Available
Segmenting the Future	Apr 24, 2019	Autonomous DrivingDecision Making	CodeCode Available
Learning Regional Purity for Instance Segmentation on 3D Point Clouds	Nov 3, 2020	3D Instance Segmentation3D Semantic Segmentation	CodeCode Available
SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model	May 29, 2025	Image Super-ResolutionLanguage Modeling	CodeCode Available
Learning Panoptic Segmentation from Instance Contours	Oct 16, 2020	ClusteringInstance Segmentation	CodeCode Available
Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning	Nov 26, 2024	Objectobject-detection	CodeCode Available
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Apr 20, 2025	Autonomous DrivingImage Captioning	CodeCode Available
Efficient Computation Sharing for Multi-Task Visual Scene Understanding	Mar 16, 2023	Multi-Task LearningScene Understanding	CodeCode Available
DualMLP: a two-stream fusion model for 3D point cloud classification	Oct 10, 2023	3D Point Cloud ClassificationPoint Cloud Classification	CodeCode Available
Road Scene Understanding by Occupancy Grid Learning from Sparse Radar Clusters using Semantic Segmentation	Mar 31, 2019	Autonomous Drivingroad scene understanding	CodeCode Available
Self-Supervised Partial Cycle-Consistency for Multi-View Matching	Jan 10, 2025	Scene Understanding	CodeCode Available
Learning Monocular Depth by Distilling Cross-domain Stereo Networks	Aug 20, 2018	Autonomous DrivingDepth Estimation	CodeCode Available
Boundary-Seeking Generative Adversarial Networks	Feb 27, 2017	Scene UnderstandingText Generation	CodeCode Available
Dual-Glance Model for Deciphering Social Relationships	Aug 2, 2017	modelobject-detection	CodeCode Available
Self-Supervised Road Layout Parsing with Graph Auto-Encoding	Mar 21, 2022	Image ReconstructionScene Understanding	CodeCode Available
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors	May 30, 2025	3D geometryLarge Language Model	CodeCode Available
Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects	May 31, 2023	3D Pose EstimationContrastive Learning	CodeCode Available
Zoom in on the Plant: Fine-grained Analysis of Leaf, Stem and Vein Instances	Dec 14, 2023	Scene Understanding	CodeCode Available
Language-based Colorization of Scene Sketches	Nov 17, 2019	ColorizationImage Generation	CodeCode Available
Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning	Sep 16, 2021	DecoderImage Captioning	CodeCode Available
Adversarial Attacks on Monocular Pose Estimation	Jul 14, 2022	Depth EstimationMonocular Depth Estimation	CodeCode Available
Visually Grounded VQA by Lattice-based Retrieval	Nov 15, 2022	Information RetrievalQuestion Answering	CodeCode Available
The ADUULM-360 Dataset -- A Multi-Modal Dataset for Depth Estimation in Adverse Weather	Nov 18, 2024	Autonomous DrivingDepth Estimation	CodeCode Available
DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection	May 14, 2025	object-detectionObject Detection	CodeCode Available
Doubly Contrastive End-to-End Semantic Segmentation for Autonomous Driving under Adverse Weather	Nov 21, 2022	Autonomous DrivingGPU	CodeCode Available
A Review on Deep Learning Techniques Applied to Semantic Segmentation	Apr 22, 2017	Autonomous DrivingDeep Learning	CodeCode Available
Semantic Foreground Inpainting from Weak Supervision	Sep 10, 2019	Scene UnderstandingSemantic Segmentation	CodeCode Available
BOLD5000: A public fMRI dataset of 5000 images	Sep 5, 2018	DiversityScene Understanding	CodeCode Available
DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding	Mar 25, 2024	DecoderObject	CodeCode Available
UniNet: A Unified Scene Understanding Network and Exploring Multi-Task Relationships through the Lens of Adversarial Attacks	Aug 10, 2021	Depth EstimationDepth Prediction	CodeCode Available
Knowledge-Guided Object Discovery with Acquired Deep Impressions	Mar 19, 2021	ObjectObject Discovery	CodeCode Available

Show:10 25 50

← PrevPage 34 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified