Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1651–1675 of 1723 papers

Title	Date	Tasks	Status
SDOF-Tracker: Fast and Accurate Multiple Human Tracking by Skipped-Detection and Optical-Flow	Jun 27, 2021	Human DetectionOptical Flow Estimation	CodeCode Available
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation	Aug 21, 2024	3D Semantic SegmentationData Augmentation	CodeCode Available
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions	Sep 19, 2024	Audio captioningLanguage Modeling	CodeCode Available
Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation	Apr 12, 2018	Optical Flow EstimationScene Flow Estimation	CodeCode Available
Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment	Jun 12, 2024	3D ReconstructionScene Understanding	CodeCode Available
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios	May 21, 2023	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available
Aerial Scene Understanding in The Wild: Multi-Scene Recognition via Prototype-based Memory Networks	Apr 22, 2021	RetrievalScene Recognition	CodeCode Available
Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene Understanding	Nov 28, 2022	Contrastive LearningDecision Making	CodeCode Available
Evaluating Compositional Scene Understanding in Multimodal Generative Models	Mar 29, 2025	Scene Understanding	CodeCode Available
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning	Mar 5, 2023	Answer GenerationEntity Alignment	CodeCode Available
ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation	Oct 9, 2017	GPUReal-Time Semantic Segmentation	CodeCode Available
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding	Jul 28, 2024	Contrastive LearningIntention-oriented Segmentation	CodeCode Available
SeGAN: Segmenting and Generating the Invisible	Mar 29, 2017	Depth EstimationScene Understanding	CodeCode Available
Artificial Color Constancy via GoogLeNet with Angular Loss Function	Nov 20, 2018	Color ConstancyObject Recognition	CodeCode Available
Adaptive Visual Scene Understanding: Incremental Scene Graph Generation	Oct 2, 2023	BenchmarkingContinual Learning	CodeCode Available
Temporally Consistent Horizon Lines	Jul 23, 2019	3D ReconstructionAutonomous Vehicles	CodeCode Available
CARL-D: A vision benchmark suite and large scale dataset for vehicle detection and scene segmentation	Feb 17, 2022	2D Object DetectionAutonomous Driving	CodeCode Available
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions	Feb 13, 2018	BIG-bench Machine LearningManagement	CodeCode Available
Efficient ConvNet for Real-time Semantic Segmentation	Jun 1, 2017	GPUReal-Time Semantic Segmentation	CodeCode Available
Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence	May 22, 2019	Optical Flow EstimationScene Understanding	CodeCode Available
Segmenting the Future	Apr 24, 2019	Autonomous DrivingDecision Making	CodeCode Available
Learning Regional Purity for Instance Segmentation on 3D Point Clouds	Nov 3, 2020	3D Instance Segmentation3D Semantic Segmentation	CodeCode Available
SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model	May 29, 2025	Image Super-ResolutionLanguage Modeling	CodeCode Available
Learning Panoptic Segmentation from Instance Contours	Oct 16, 2020	ClusteringInstance Segmentation	CodeCode Available
Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning	Nov 26, 2024	Objectobject-detection	CodeCode Available

Show:10 25 50

← PrevPage 67 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified