Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–750 of 1723 papers

Title	Date	Tasks	Status	Hype
Shape Anchor Guided Holistic Indoor Scene Understanding	Sep 20, 2023	3D Object Detectionobject-detection	CodeCode Available	0
LiON: Learning Point-wise Abstaining Penalty for LiDAR Outlier DetectioN Using Diverse Synthetic Data	Sep 19, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available	1
Mask4D: End-to-End Mask-Based 4D Panoptic Segmentation for LiDAR Sequences	Sep 18, 2023	3D Panoptic Segmentation4D Panoptic Segmentation	CodeCode Available	1
PanoMixSwap Panorama Mixing via Structural Swapping for Indoor Scene Understanding	Sep 18, 2023	Data AugmentationDiversity	—Unverified	0
So you think you can track?	Sep 13, 2023	BenchmarkingObject	—Unverified	0
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning	Sep 12, 2023	Autonomous VehiclesQuestion Answering	—Unverified	0
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving	Sep 12, 2023	Autonomous DrivingBenchmarking	—Unverified	0
HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D Scans	Sep 12, 2023	3D Object Retrieval3D Scene Reconstruction	CodeCode Available	1
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving	Sep 11, 2023	Autonomous DrivingDescriptive	—Unverified	0
Multi3DRefer: Grounding Text Description to Multiple 3D Objects	Sep 11, 2023	3D visual groundingContrastive Learning	CodeCode Available	1
PAg-NeRF: Towards fast and efficient end-to-end panoptic 3D representations for agricultural robotics	Sep 11, 2023	3D ReconstructionCamera Localization	—Unverified	0
Weakly Supervised Point Clouds Transformer for 3D Object Detection	Sep 8, 2023	3D Object DetectionObject	—Unverified	0
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning	Sep 6, 2023	3D dense captioningCaption Generation	CodeCode Available	1
Structural Concept Learning via Graph Attention for Multi-Level Rearrangement Planning	Sep 5, 2023	Graph AttentionObject Rearrangement	—Unverified	0
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation	Sep 1, 2023	3D Open-Vocabulary Instance Segmentation3D Open-Vocabulary Object Detection	CodeCode Available	2
Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception	Aug 31, 2023	Activity RecognitionHuman Activity Recognition	—Unverified	0
Semi-Supervised Semantic Depth Estimation using Symbiotic Transformer and NearFarMix Augmentation	Aug 28, 2023	Autonomous VehiclesDepth Estimation	—Unverified	0
End-to-end Autonomous Driving using Deep Learning: A Systematic Review	Aug 27, 2023	Autonomous Drivingobject-detection	—Unverified	0
Synergizing Contrastive Learning and Optimal Transport for 3D Point Cloud Domain Adaptation	Aug 27, 2023	Contrastive LearningDomain Adaptation	—Unverified	0
SurGNN: Explainable visual scene understanding and assessment of surgical skill using graph neural networks	Aug 24, 2023	Scene Understanding	—Unverified	0
SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets	Aug 23, 2023	Autonomous NavigationPseudo Label	CodeCode Available	1
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition	Aug 23, 2023	Gesture RecognitionScene Understanding	CodeCode Available	1
Understanding Dark Scenes by Contrasting Multi-Modal Observations	Aug 23, 2023	Contrastive LearningScene Understanding	CodeCode Available	1
ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes	Aug 22, 2023	3D Semantic SegmentationNovel View Synthesis	CodeCode Available	2
Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views	Aug 22, 2023	NeRFNeural Rendering	—Unverified	0
Explore and Tell: Embodied Visual Captioning in 3D Environments	Aug 21, 2023	Image CaptioningNavigate	—Unverified	0
Vision Relation Transformer for Unbiased Scene Graph Generation	Aug 18, 2023	DecoderGraph Generation	CodeCode Available	1
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes	Aug 17, 2023	Language ModelingLanguage Modelling	CodeCode Available	2
CASPNet++: Joint Multi-Agent Motion Prediction	Aug 15, 2023	Autonomous Drivingmotion prediction	—Unverified	0
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous Driving	Aug 14, 2023	Autonomous DrivingOptical Flow Estimation	CodeCode Available	1
Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction	Aug 8, 2023	Activity RecognitionAutonomous Driving	—Unverified	0
Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities	Aug 6, 2023	Depth EstimationInstance Segmentation	—Unverified	0
Cognitive TransFuser: Semantics-guided Transformer-based Sensor Fusion for Improved Waypoint Prediction	Aug 4, 2023	Imitation LearningScene Understanding	CodeCode Available	0
Scene-aware Human Pose Generation using Transformer	Aug 4, 2023	Knowledge DistillationScene Understanding	—Unverified	0
Weakly Supervised 3D Instance Segmentation without Instance-level Annotations	Aug 3, 2023	3D Instance SegmentationInstance Segmentation	—Unverified	0
Interpretable End-to-End Driving Model for Implicit Scene Understanding	Aug 2, 2023	Graph Generationobject-detection	—Unverified	0
Gated Driver Attention Predictor	Aug 1, 2023	Driver Attention MonitoringPrediction	CodeCode Available	0
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding	Aug 1, 2023	3D geometry3D Open-Vocabulary Instance Segmentation	—Unverified	0
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts	Jul 28, 2023	Long-range modelingMixture-of-Experts	CodeCode Available	2
OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation	Jul 28, 2023	Autonomous DrivingScene Understanding	CodeCode Available	1
Human-centric Scene Understanding for 3D Large-scale Scenarios	Jul 26, 2023	Action RecognitionScene Understanding	CodeCode Available	1
Enhancing image captioning with depth information using a Transformer-based framework	Jul 24, 2023	Image CaptioningImage Paragraph Captioning	—Unverified	0
Challenges for Monocular 6D Object Pose Estimation in Robotics	Jul 22, 2023	6D Pose Estimation using RGBObject	—Unverified	0
Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery	Jul 22, 2023	Continual LearningScene Understanding	CodeCode Available	0
Improving Online Lane Graph Extraction by Object-Lane Clustering	Jul 20, 2023	3D Object DetectionAutonomous Driving	—Unverified	0
Mining Conditional Part Semantics with Occluded Extrapolation for Human-Object Interaction Detection	Jul 19, 2023	Human-Object Interaction DetectionObject	—Unverified	0
CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation	Jul 19, 2023	Representation LearningScene Understanding	CodeCode Available	1
Towards A Unified Agent with Foundation Models	Jul 18, 2023	Efficient ExplorationReinforcement Learning (RL)	—Unverified	0
Human Action Recognition in Still Images Using ConViT	Jul 18, 2023	Action RecognitionAction Recognition In Still Images	—Unverified	0
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future	Jul 18, 2023	Knowledge Distillationobject-detection	CodeCode Available	2

Show:10 25 50

← PrevPage 15 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified