SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 301325 of 1723 papers

TitleStatusHype
Dynamic Graph Message Passing NetworksCode1
Dynamic Graph Message Passing Networks for Visual RecognitionCode1
4D Panoptic LiDAR SegmentationCode1
Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object DetectionCode1
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene UnderstandingCode1
Egocentric Scene Understanding via Multimodal Spatial RectifierCode1
A Two-Stage Masked Autoencoder Based Network for Indoor Depth CompletionCode1
Context Prior for Scene SegmentationCode1
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic SurgeryCode1
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine IntelligenceCode1
Detecting Human-Object Interaction via Fabricated Compositional LearningCode1
A Survey of World Models for Autonomous DrivingCode1
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal EstimationCode1
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene ContextsCode1
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World EnvironmentsCode1
Event-aided Semantic Scene CompletionCode1
Affect2MM: Affective Analysis of Multimedia Content Using Emotion CausalityCode1
Event-based Motion Segmentation with Spatio-Temporal Graph CutsCode1
Digging Into Self-Supervised Monocular Depth EstimationCode1
MassMIND: Massachusetts Maritime INfrared DatasetCode1
AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D ScansCode1
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene UnderstandingCode1
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture RecognitionCode1
LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for Autonomous DrivingCode1
M3D-RPN: Monocular 3D Region Proposal Network for Object DetectionCode1
Show:102550
← PrevPage 13 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified