SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 301350 of 1723 papers

TitleStatusHype
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source LocalizationCode1
Generating Visual Spatial Description via Holistic 3D Scene UnderstandingCode1
Holistic 3D Scene Understanding from a Single Image with Implicit RepresentationCode1
Language Embedded 3D Gaussians for Open-Vocabulary Scene UnderstandingCode1
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object DetectionCode1
IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic EnvironmentsCode1
A Two-Stage Masked Autoencoder Based Network for Indoor Depth CompletionCode1
Context Prior for Scene SegmentationCode1
A Survey on Deep Learning Technique for Video SegmentationCode1
Image Masking for Robust Self-Supervised Monocular Depth EstimationCode1
4D Panoptic LiDAR SegmentationCode1
Deep Learning for Event-based Vision: A Comprehensive Survey and BenchmarksCode1
F-ViTA: Foundation Model Guided Visible to Thermal TranslationCode1
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic SurgeryCode1
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World EnvironmentsCode1
IRS: A Large Naturalistic Indoor Robotics Stereo Dataset to Train Deep Models for Disparity and Surface Normal EstimationCode1
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine IntelligenceCode1
KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3DCode1
Deep learning for radar data exploitation of autonomous vehicleCode1
A Survey of World Models for Autonomous DrivingCode1
AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D ScansCode1
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene UnderstandingCode1
Affect2MM: Affective Analysis of Multimedia Content Using Emotion CausalityCode1
Learning How To Robustly Estimate Camera Pose in Endoscopic VideosCode1
Automatic Extrinsic Calibration Method for LiDAR and Camera Sensor SetupsCode1
From General to Specific: Informative Scene Graph Generation via Balance AdjustmentCode1
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving ScenesCode1
Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene UnderstandingCode1
Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic SegmentationCode1
LiON: Learning Point-wise Abstaining Penalty for LiDAR Outlier DetectioN Using Diverse Synthetic DataCode1
All-Day Multi-Camera Multi-Target TrackingCode1
Learning Triadic Belief Dynamics in Nonverbal Communication from VideosCode1
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene UnderstandingCode1
DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image FusionCode1
Light Field Networks: Neural Scene Representations with Single-Evaluation RenderingCode1
LinkNet: Exploiting Encoder Representations for Efficient Semantic SegmentationCode1
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial RelationsCode1
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household RoboticsCode1
Few-Shot Object Detection and Viewpoint Estimation for Objects in the WildCode1
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous DrivingCode1
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion FramesCode1
LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for Autonomous DrivingCode1
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene UnderstandingCode1
MassMIND: Massachusetts Maritime INfrared DatasetCode1
AVSegFormer: Audio-Visual Segmentation with TransformerCode1
MGNet: Monocular Geometric Scene Understanding for Autonomous DrivingCode1
A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed ImagesCode1
AeroRIT: A New Scene for Hyperspectral Image AnalysisCode1
A Versatile and Efficient Reinforcement Learning Framework for Autonomous DrivingCode1
FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud SegmentationCode1
Show:102550
← PrevPage 7 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified