Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1301–1350 of 1723 papers

Title	Date	Tasks	Status
Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames	Nov 28, 2023	ClusteringDiversity	—Unverified
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments	Nov 28, 2024	Adversarial TextScene Understanding	—Unverified
Scene Text Detection for Augmented Reality -- Character Bigram Approach to reduce False Positive Rate	Dec 26, 2020	Scene Text DetectionScene Understanding	—Unverified
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text	Apr 25, 2022	Image RetrievalRetrieval	—Unverified
Scene Understanding Enabled Semantic Communication with Open Channel Coding	Jan 24, 2025	Question AnsweringScene Understanding	—Unverified
Scene Understanding for Autonomous Manipulation with Deep Learning	Mar 23, 2019	Action UnderstandingAffordance Detection	—Unverified
Scene Understanding for Autonomous Driving	May 11, 2021	Autonomous DrivingScene Understanding	—Unverified
Scene Understanding in Pick-and-Place Tasks: Analyzing Transformations Between Initial and Final Scenes	Sep 26, 2024	object-detectionObject Detection	—Unverified
Scene Understanding Networks for Autonomous Driving based on Around View Monitoring System	May 18, 2018	3D Object DetectionAutonomous Driving	—Unverified
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding	Jan 17, 2024	3D visual groundingScene Understanding	—Unverified
Visual Semantic Parsing: From Images to Abstract Meaning Representation	Oct 26, 2022	Abstract Meaning RepresentationScene Understanding	—Unverified
SDNet: Semantically Guided Depth Estimation Network	Jul 24, 2019	Autonomous VehiclesDepth Estimation	—Unverified
Visual-Semantic Scene Understanding by Sharing Labels in a Context Network	Sep 16, 2013	Data AugmentationObject	—Unverified
SE(3) Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation	Nov 11, 2024	Data AugmentationDecoder	—Unverified
SeaDSC: A video-based unsupervised method for dynamic scene change detection in unmanned surface vehicles	Nov 20, 2023	Change DetectionMotion Planning	—Unverified
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction	Sep 17, 2021	Representation LearningSaliency Prediction	—Unverified
SeasoNet: A Seasonal Scene Classification, segmentation and Retrieval dataset for satellite Imagery over Germany	Jul 19, 2022	Image RetrievalRetrieval	—Unverified
Second-order Democratic Aggregation	Aug 22, 2018	General ClassificationMaterial Classification	—Unverified
Neural Groundplans: Persistent Neural Scene Representations from a Single Image	Jul 22, 2022	DisentanglementInstance Segmentation	—Unverified
Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer	Apr 24, 2024	Grounded Situation RecognitionScene Understanding	—Unverified
Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning	May 14, 2025	Relation ExtractionScene Understanding	—Unverified
Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis	Jul 15, 2025	MarketingOptical Character Recognition	—Unverified
Seeing with Humans: Gaze-Assisted Neural Image Captioning	Aug 18, 2016	Image CaptioningObject	—Unverified
Seeing With Sound: Long-range Acoustic Beamforming for Multimodal Scene Understanding	Jan 1, 2023	Autonomous Vehiclesobject-detection	—Unverified
Visual Traffic Knowledge Graph Generation from Scene Images	Jan 1, 2023	Graph AttentionGraph Generation	—Unverified
Segment Any 3D Gaussians	Dec 1, 2023	Interactive SegmentationScene Understanding	—Unverified
Segment Any Object Model (SAOM): Real-to-Simulation Fine-Tuning Strategy for Multi-Class Multi-Instance Segmentation	Mar 16, 2024	Instance SegmentationObject	—Unverified
Segment Any RGB-Thermal Model with Language-aided Distillation	May 4, 2025	Instance SegmentationKnowledge Distillation	—Unverified
Segment Anything, Even Occluded	Mar 8, 2025	Amodal Instance SegmentationAutonomous Driving	—Unverified
Segmentation Guided Attention Networks for Visual Question Answering	Jul 1, 2017	Common Sense ReasoningQuestion Answering	—Unverified
Segmentation-guided Domain Adaptation for Efficient Depth Completion	Oct 14, 2022	Depth CompletionDomain Adaptation	—Unverified
Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation	Jan 1, 2022	3D Semantic SegmentationAutonomous Driving	—Unverified
YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks	Jan 16, 2025	AI AgentScene Understanding	—Unverified
Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds	Sep 6, 2021	Scene UnderstandingSuper-Resolution	—Unverified
Visual Vibrometry: Estimating MaterialProperties from Small Motions in Video	Apr 15, 2017	ObjectScene Understanding	—Unverified
Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness	Jan 1, 2024	Human-Object Interaction Detectionobject-detection	—Unverified
Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding	May 24, 2025	Domain GeneralizationRepresentation Learning	—Unverified
Self-supervised Learning of Occlusion Aware Flow Guided 3D Geometry Perception with Adaptive Cross Weighted Loss from Monocular Videos	Aug 9, 2021	3D geometry3D Geometry Perception	—Unverified
Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness	Jul 7, 2024	Activity RecognitionScene Understanding	—Unverified
Self-Supervised Object Detection from Egocentric Videos	Jan 1, 2023	Class-agnostic Object DetectionObject	—Unverified
Visual Vibrometry: Estimating Material Properties From Small Motion in Video	Jun 1, 2015	Scene Understanding	—Unverified
Visual Vibrometry: Estimating Material Properties from Small Motions in Video	Apr 15, 2017	Scene Understanding	—Unverified
Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge Transfer	Oct 9, 2020	Decoderimage-classification	—Unverified
Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding	May 8, 2023	PredictionScene Understanding	—Unverified
Self-Supervised Relative Depth Learning for Urban Scene Understanding	Dec 13, 2017	Depth EstimationMonocular Depth Estimation	—Unverified
Visuomotor Understanding for Representation Learning of Driving Scenes	Sep 16, 2019	Optical Flow EstimationRepresentation Learning	—Unverified
Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields	Jun 9, 2022	Data AugmentationEdge Detection	—Unverified
VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion	Feb 25, 2025	Autonomous DrivingNavigate	—Unverified
SELMA: SEmantic Large-scale Multimodal Acquisitions in Variable Weather, Daytime and Viewpoints	Apr 20, 2022	Autonomous DrivingScene Understanding	—Unverified
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models	May 3, 2025	DiagnosticObject Recognition	—Unverified

Show:10 25 50

← PrevPage 27 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified