SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 426450 of 1723 papers

TitleStatusHype
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language ModelsCode1
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object DetectionCode1
Semantic Segmentation-Assisted Instance Feature Fusion for Multi-Level 3D Part Instance SegmentationCode1
From General to Specific: Informative Scene Graph Generation via Balance AdjustmentCode1
SemSegDepth: A Combined Model for Semantic Segmentation and Depth CompletionCode1
Global Aggregation then Local Distribution in Fully Convolutional NetworksCode1
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous DrivingCode1
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene UnderstandingCode1
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map ConstructionCode1
Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene UnderstandingCode1
Dual-Hybrid Attention Network for Specular Highlight RemovalCode1
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous DrivingCode1
Spatio-temporal Self-Supervised Representation Learning for 3D Point CloudsCode1
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and ReasoningCode1
Dynamic Graph Message Passing NetworksCode1
Dynamic Graph Message Passing Networks for Visual RecognitionCode1
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation ModelsCode1
A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed ImagesCode1
Few-Shot Object Detection and Viewpoint Estimation for Objects in the WildCode1
Stealing Stable Diffusion Prior for Robust Monocular Depth EstimationCode1
FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud SegmentationCode1
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D DataCode1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene UnderstandingCode1
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth EstimationCode1
FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier ConvolutionsCode1
Show:102550
← PrevPage 18 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified