SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 276300 of 1723 papers

TitleStatusHype
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial RelationsCode1
Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise BinarizationCode1
Dynamic Graph Message Passing Networks for Visual RecognitionCode1
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household RoboticsCode1
M3D-RPN: Monocular 3D Region Proposal Network for Object DetectionCode1
Microsoft COCO: Common Objects in ContextCode1
Deep learning for radar data exploitation of autonomous vehicleCode1
A Survey on Deep Learning Technique for Video SegmentationCode1
Leveraging Large (Visual) Language Models for Robot 3D Scene UnderstandingCode1
4D Panoptic LiDAR SegmentationCode1
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based OptimizationCode1
Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D GaussianCode1
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic SurgeryCode1
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine IntelligenceCode1
Collaborative Transformers for Grounded Situation RecognitionCode1
Deep Learning for Event-based Vision: A Comprehensive Survey and BenchmarksCode1
Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity CollaborationCode1
Complementary Random Masking for RGB-Thermal Semantic SegmentationCode1
Detecting Human-Object Interaction via Fabricated Compositional LearningCode1
3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose EstimationCode1
Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object DetectionCode1
DIP: Unsupervised Dense In-Context Post-training of Visual RepresentationsCode1
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object DetectionCode1
A Survey of World Models for Autonomous DrivingCode1
Affect2MM: Affective Analysis of Multimedia Content Using Emotion CausalityCode1
Show:102550
← PrevPage 12 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified