Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 1723 papers

Title	Date	Tasks	Status	Hype
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments	Jul 15, 2023	DecoderGrounded Situation Recognition	CodeCode Available	1
DeepIPCv2: LiDAR-powered Robust Environmental Perception and Navigational Control for Autonomous Vehicle	Jul 13, 2023	Autonomous DrivingScene Understanding	CodeCode Available	0
The IMPTC Dataset: An Infrastructural Multi-Person Trajectory and Context Dataset	Jul 12, 2023	Scene Understanding	CodeCode Available	1
Smart Infrastructure: A Research Junction	Jul 12, 2023	Scene UnderstandingSynthetic Data Generation	—Unverified	0
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery	Jul 11, 2023	Question AnsweringScene Understanding	CodeCode Available	1
Test-Time Adaptation for Nighttime Color-Thermal Semantic Segmentation	Jul 10, 2023	Scene UnderstandingSemantic Segmentation	—Unverified	0
PSDR-Room: Single Photo to Scene using Differentiable Rendering	Jul 6, 2023	Scene Understanding	—Unverified	0
Towards accurate instance segmentation in large-scale LiDAR point clouds	Jul 6, 2023	ClusteringInstance Segmentation	CodeCode Available	1
Object Recognition System on a Tactile Device for Visually Impaired	Jul 5, 2023	object-detectionObject Detection	—Unverified	0
AVSegFormer: Audio-Visual Segmentation with Transformer	Jul 3, 2023	DecoderScene Understanding	CodeCode Available	1
Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization	Jul 3, 2023	object-detectionObject Detection	—Unverified	0
Towards Open Vocabulary Learning: A Survey	Jun 28, 2023	Open Set LearningOut-of-Distribution Detection	CodeCode Available	2
Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis	Jun 28, 2023	Scene Understanding	CodeCode Available	0
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos	Jun 27, 2023	Multi-Task LearningScene Understanding	—Unverified	0
Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties	Jun 27, 2023	FrictionScene Understanding	—Unverified	0
SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion	Jun 27, 2023	Autonomous DrivingScene Understanding	CodeCode Available	1
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation	Jun 23, 2023	Graph GenerationScene Graph Generation	—Unverified	0
OpenMask3D: Open-Vocabulary 3D Instance Segmentation	Jun 23, 2023	3D Instance Segmentation3D Open-Vocabulary Instance Segmentation	CodeCode Available	2
Semantic-aware Transmission for Robust Point Cloud Classification	Jun 23, 2023	ClassificationDecoder	—Unverified	0
Multi-view 3D Object Reconstruction and Uncertainty Modelling with Neural Shape Prior	Jun 17, 2023	3D Object ReconstructionObject	CodeCode Available	1
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation	Jun 17, 2023	Decision MakingInstruction Following	—Unverified	0
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation	Jun 16, 2023	3D Panoptic SegmentationAutonomous Driving	CodeCode Available	1
Estimating Generic 3D Room Structures from 2D Annotations	Jun 15, 2023	Scene Understanding	CodeCode Available	1
DORSal: Diffusion for Object-centric Representations of Scenes et al	Jun 13, 2023	Neural RenderingObject	—Unverified	0
Neural Projection Mapping Using Reflectance Fields	Jun 11, 2023	Scene Understanding	—Unverified	0
SNeL: A Structured Neuro-Symbolic Language for Entity-Based Multimodal Scene Understanding	Jun 9, 2023	Scene Understanding	—Unverified	0
SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding	Jun 8, 2023	Scene Understanding	CodeCode Available	1
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding	Jun 8, 2023	DecoderMulti-Task Learning	CodeCode Available	2
TopoMask: Instance-Mask-Based Formulation for the Road Topology Problem via Transformer-Based Architecture	Jun 8, 2023	3D Lane DetectionGraph Neural Network	—Unverified	0
A Dynamic Feature Interaction Framework for Multi-task Visual Perception	Jun 8, 2023	Autonomous DrivingDepth Estimation	—Unverified	0
Towards Label-free Scene Understanding by Vision Foundation Models	Jun 6, 2023	image-classificationImage Classification	CodeCode Available	1
Disaster Anomaly Detector via Deeper FCDDs for Explainable Initial Responses	Jun 5, 2023	Anomaly DetectionDisaster Response	—Unverified	0
Recyclable Semi-supervised Method Based on Multi-model Ensemble for Video Scene Parsing	Jun 5, 2023	Scene ParsingScene Understanding	—Unverified	0
Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes	Jun 4, 2023	Common Sense ReasoningQuestion Answering	—Unverified	0
Towards In-context Scene Understanding	Jun 2, 2023	Depth EstimationIn-Context Learning	CodeCode Available	1
Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects	May 31, 2023	3D Pose EstimationContrastive Learning	CodeCode Available	0
Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast	May 31, 2023	3D Instance Segmentation3D Object Detection	CodeCode Available	1
Dynamic Clustering Transformer Network for Point Cloud Segmentation	May 30, 2023	ClusteringDecoder	—Unverified	0
Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient Scene Graph Generation	May 30, 2023	Graph GenerationImage Generation	CodeCode Available	0
Multi-Scale Attention for Audio Question Answering	May 29, 2023	Audio Question AnsweringQuestion Answering	CodeCode Available	1
Robust Category-Level 3D Pose Estimation from Synthetic Data	May 25, 2023	3D Pose Estimation3D Reconstruction	—Unverified	0
Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments	May 25, 2023	Continual LearningContinual Semantic Segmentation	—Unverified	0
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios	May 21, 2023	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	0
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer	May 21, 2023	3D Object Detectionobject-detection	—Unverified	0
Generating Visual Spatial Description via Holistic 3D Scene Understanding	May 19, 2023	Scene UnderstandingText Generation	CodeCode Available	1
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding	May 18, 2023	Contrastive LearningObject	—Unverified	0
TextSLAM: Visual SLAM with Semantic Planar Text Features	May 17, 2023	Mixed RealityObject SLAM	CodeCode Available	2
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models	May 15, 2023	3D Object DetectionImage Captioning	CodeCode Available	1
Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs	May 15, 2023	RelationScene Graph Generation	CodeCode Available	0
MetaMorphosis: Task-oriented Privacy Cognizant Feature Generation for Multi-task Learning	May 13, 2023	Deep LearningDepth Estimation	—Unverified	0

Show:10 25 50

← PrevPage 16 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified