SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 351375 of 1723 papers

TitleStatusHype
Affect2MM: Affective Analysis of Multimedia Content Using Emotion CausalityCode1
Global-Reasoned Multi-Task Learning Model for Surgical Scene UnderstandingCode1
Multimodal Fusion and Vision-Language Models: A Survey for Robot VisionCode1
Deep Learning for Event-based Vision: A Comprehensive Survey and BenchmarksCode1
F-ViTA: Foundation Model Guided Visible to Thermal TranslationCode1
Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene UnderstandingCode1
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object DetectionCode1
Multi-view 3D Object Reconstruction and Uncertainty Modelling with Neural Shape PriorCode1
Distilled Semantics for Comprehensive Scene Understanding from VideosCode1
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based OptimizationCode1
Behind the Curtain: Learning Occluded Shapes for 3D Object DetectionCode1
NODIS: Neural Ordinary Differential Scene UnderstandingCode1
OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic SegmentationCode1
DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric VoxelizationCode1
DeepScores -- A Dataset for Segmentation, Detection and Classification of Tiny ObjectsCode1
Occlusion-Aware Depth Estimation with Adaptive Normal ConstraintsCode1
AeroRIT: A New Scene for Hyperspectral Image AnalysisCode1
FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud SegmentationCode1
BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object DetectionCode1
One-Shot Object Affordance Detection in the WildCode1
FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier ConvolutionsCode1
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIPCode1
Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity CollaborationCode1
Comprehensive Visual Question Answering on Point Clouds through Compositional Scene ManipulationCode1
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous DrivingCode1
Show:102550
← PrevPage 15 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified