SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 326350 of 1723 papers

TitleStatusHype
4D Panoptic LiDAR SegmentationCode1
DPF: Learning Dense Prediction Fields with Weak SupervisionCode1
Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene UnderstandingCode1
MuirBench: A Comprehensive Benchmark for Robust Multi-image UnderstandingCode1
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic SurgeryCode1
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine IntelligenceCode1
MSeg: A Composite Dataset for Multi-domain Semantic SegmentationCode1
A Survey of World Models for Autonomous DrivingCode1
Affect2MM: Affective Analysis of Multimedia Content Using Emotion CausalityCode1
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based DecodersCode1
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question AnsweringCode1
Distilled Semantics for Comprehensive Scene Understanding from VideosCode1
MonoDistill: Learning Spatial Features for Monocular 3D Object DetectionCode1
DIP: Unsupervised Dense In-Context Post-training of Visual RepresentationsCode1
General Geometry-aware Weakly Supervised 3D Object DetectionCode1
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion FramesCode1
AeroRIT: A New Scene for Hyperspectral Image AnalysisCode1
GFF: Gated Fully Fusion for Semantic SegmentationCode1
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object DetectionCode1
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point CloudCode1
Global Aggregation then Local Distribution in Fully Convolutional NetworksCode1
Digging Into Self-Supervised Monocular Depth EstimationCode1
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map ConstructionCode1
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic FieldsCode1
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIPCode1
Show:102550
← PrevPage 14 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified