Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 1723 papers

Title	Date	Tasks	Status	Hype
Simple Image-level Classification Improves Open-vocabulary Object Detection	Dec 16, 2023	Knowledge DistillationObject	CodeCode Available	1
Transformers in Unsupervised Structure-from-Motion	Dec 16, 2023	Decision Makingimage-classification	CodeCode Available	1
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments	Dec 14, 2023	3D ReconstructionDecoder	CodeCode Available	1
Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection	Dec 5, 2023	3D Object DetectionDenoising	CodeCode Available	1
Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding	Nov 30, 2023	GPUInductive Bias	CodeCode Available	1
SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation	Nov 29, 2023	Scene SegmentationScene Understanding	CodeCode Available	1
Panoptic Video Scene Graph Generation	Nov 28, 2023	Graph GenerationPanoptic Scene Graph Generation	CodeCode Available	1
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge	Nov 21, 2023	Large Language ModelMultimodal Deep Learning	CodeCode Available	1
TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding	Nov 6, 2023	Boundary DetectionDepth Estimation	CodeCode Available	1
NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment	Nov 5, 2023	Caption GenerationCommon Sense Reasoning	CodeCode Available	1
TPSeNCE: Towards Artifact-Free Realistic Rain Generation for Deraining and Object Detection in Rain	Nov 1, 2023	Contrastive LearningImage-to-Image Translation	CodeCode Available	1
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving	Oct 3, 2023	Autonomous DrivingDecision Making	CodeCode Available	1
TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation	Oct 3, 2023	Autonomous DrivingScene Understanding	CodeCode Available	1
Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms	Sep 27, 2023	object-detectionObject Detection	CodeCode Available	1
PanopticNDT: Efficient and Robust Panoptic Mapping	Sep 24, 2023	2D Panoptic Segmentation3D Panoptic Segmentation	CodeCode Available	1
LiON: Learning Point-wise Abstaining Penalty for LiDAR Outlier DetectioN Using Diverse Synthetic Data	Sep 19, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available	1
Mask4D: End-to-End Mask-Based 4D Panoptic Segmentation for LiDAR Sequences	Sep 18, 2023	3D Panoptic Segmentation4D Panoptic Segmentation	CodeCode Available	1
HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D Scans	Sep 12, 2023	3D Object Retrieval3D Scene Reconstruction	CodeCode Available	1
Multi3DRefer: Grounding Text Description to Multiple 3D Objects	Sep 11, 2023	3D visual groundingContrastive Learning	CodeCode Available	1
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning	Sep 6, 2023	3D dense captioningCaption Generation	CodeCode Available	1
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition	Aug 23, 2023	Gesture RecognitionScene Understanding	CodeCode Available	1
Understanding Dark Scenes by Contrasting Multi-Modal Observations	Aug 23, 2023	Contrastive LearningScene Understanding	CodeCode Available	1
SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets	Aug 23, 2023	Autonomous NavigationPseudo Label	CodeCode Available	1
Vision Relation Transformer for Unbiased Scene Graph Generation	Aug 18, 2023	DecoderGraph Generation	CodeCode Available	1
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous Driving	Aug 14, 2023	Autonomous DrivingOptical Flow Estimation	CodeCode Available	1
OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation	Jul 28, 2023	Autonomous DrivingScene Understanding	CodeCode Available	1
Human-centric Scene Understanding for 3D Large-scale Scenarios	Jul 26, 2023	Action RecognitionScene Understanding	CodeCode Available	1
CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation	Jul 19, 2023	Representation LearningScene Understanding	CodeCode Available	1
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments	Jul 15, 2023	DecoderGrounded Situation Recognition	CodeCode Available	1
The IMPTC Dataset: An Infrastructural Multi-Person Trajectory and Context Dataset	Jul 12, 2023	Scene Understanding	CodeCode Available	1
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery	Jul 11, 2023	Question AnsweringScene Understanding	CodeCode Available	1
Towards accurate instance segmentation in large-scale LiDAR point clouds	Jul 6, 2023	ClusteringInstance Segmentation	CodeCode Available	1
AVSegFormer: Audio-Visual Segmentation with Transformer	Jul 3, 2023	DecoderScene Understanding	CodeCode Available	1
SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion	Jun 27, 2023	Autonomous DrivingScene Understanding	CodeCode Available	1
Multi-view 3D Object Reconstruction and Uncertainty Modelling with Neural Shape Prior	Jun 17, 2023	3D Object ReconstructionObject	CodeCode Available	1
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation	Jun 16, 2023	3D Panoptic SegmentationAutonomous Driving	CodeCode Available	1
Estimating Generic 3D Room Structures from 2D Annotations	Jun 15, 2023	Scene Understanding	CodeCode Available	1
SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding	Jun 8, 2023	Scene Understanding	CodeCode Available	1
Towards Label-free Scene Understanding by Vision Foundation Models	Jun 6, 2023	image-classificationImage Classification	CodeCode Available	1
Towards In-context Scene Understanding	Jun 2, 2023	Depth EstimationIn-Context Learning	CodeCode Available	1
Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast	May 31, 2023	3D Instance Segmentation3D Object Detection	CodeCode Available	1
Multi-Scale Attention for Audio Question Answering	May 29, 2023	Audio Question AnsweringQuestion Answering	CodeCode Available	1
Generating Visual Spatial Description via Holistic 3D Scene Understanding	May 19, 2023	Scene UnderstandingText Generation	CodeCode Available	1
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models	May 15, 2023	3D Object DetectionImage Captioning	CodeCode Available	1
Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and Beyond	May 11, 2023	Scene Understanding	CodeCode Available	1
DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric Voxelization	Apr 30, 2023	DecoderNeRF	CodeCode Available	1
A Review of Panoptic Segmentation for Mobile Mapping Point Clouds	Apr 27, 2023	Instance SegmentationPanoptic Segmentation	CodeCode Available	1
RGB-D Indiscernible Object Counting in Underwater Scenes	Apr 23, 2023	BenchmarkingDepth Estimation	CodeCode Available	1
Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation	Apr 22, 2023	Autonomous DrivingKnowledge Distillation	CodeCode Available	1
Advances in Deep Concealed Scene Understanding	Apr 21, 2023	Scene UnderstandingSemantic Segmentation	CodeCode Available	1

Show:10 25 50

← PrevPage 5 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified