SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 151175 of 1723 papers

TitleStatusHype
Grounded Situation Recognition with TransformersCode1
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source LocalizationCode1
Global Aggregation then Local Distribution in Fully Convolutional NetworksCode1
GFF: Gated Fully Fusion for Semantic SegmentationCode1
Global-Reasoned Multi-Task Learning Model for Surgical Scene UnderstandingCode1
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion FramesCode1
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic FieldsCode1
General Geometry-aware Weakly Supervised 3D Object DetectionCode1
DC-SAM: In-Context Segment Anything in Images and Videos via Dual ConsistencyCode1
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map ConstructionCode1
Generating Visual Spatial Description via Holistic 3D Scene UnderstandingCode1
Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic SegmentationCode1
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving ScenesCode1
F-ViTA: Foundation Model Guided Visible to Thermal TranslationCode1
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene UnderstandingCode1
Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene UnderstandingCode1
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World EnvironmentsCode1
3DP3: 3D Scene Perception via Probabilistic ProgrammingCode1
CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic SegmentationCode1
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object DetectionCode1
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene UnderstandingCode1
Context Prior for Scene SegmentationCode1
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous DrivingCode1
AVSegFormer: Audio-Visual Segmentation with TransformerCode1
Few-Shot Object Detection and Viewpoint Estimation for Objects in the WildCode1
Show:102550
← PrevPage 7 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified