SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 151200 of 1723 papers

TitleStatusHype
HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D ScansCode1
Human-centric Scene Understanding for 3D Large-scale ScenariosCode1
Image Segmentation Using Deep Learning: A SurveyCode1
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source LocalizationCode1
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D UnderstandingCode1
Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View ImagesCode1
Group Contextual Encoding for 3D Point CloudsCode1
HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph GenerationCode1
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic FieldsCode1
Grounded Situation Recognition with TransformersCode1
GFF: Gated Fully Fusion for Semantic SegmentationCode1
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation ModelsCode1
Generating Visual Spatial Description via Holistic 3D Scene UnderstandingCode1
Global Aggregation then Local Distribution in Fully Convolutional NetworksCode1
Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship DetectionCode1
Improving Visual Recognition with Hyperbolical Visual Hierarchy MappingCode1
Learning Triadic Belief Dynamics in Nonverbal Communication from VideosCode1
F-ViTA: Foundation Model Guided Visible to Thermal TranslationCode1
Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic SegmentationCode1
From General to Specific: Informative Scene Graph Generation via Balance AdjustmentCode1
Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic SegmentationCode1
FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud SegmentationCode1
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object DetectionCode1
Few-Shot Object Detection and Viewpoint Estimation for Objects in the WildCode1
Behind the Curtain: Learning Occluded Shapes for 3D Object DetectionCode1
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and ReasoningCode1
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous DrivingCode1
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene UnderstandingCode1
Explainable Object-induced Action Decision for Autonomous VehiclesCode1
Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene UnderstandingCode1
A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed ImagesCode1
Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph AnalysisCode1
Boundary-induced and scene-aggregated network for monocular depth predictionCode1
BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object DetectionCode1
Beyond Appearances: Material Segmentation with Embedded Spectral Information from RGB-D imageryCode1
FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier ConvolutionsCode1
3DP3: 3D Scene Perception via Probabilistic ProgrammingCode1
Event-based Motion Segmentation with Spatio-Temporal Graph CutsCode1
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene ContextsCode1
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous DrivingCode1
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense KnowledgeCode1
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation ModelCode1
Bidirectional Projection Network for Cross Dimension Scene UnderstandingCode1
AVSegFormer: Audio-Visual Segmentation with TransformerCode1
Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and BeyondCode1
Global-Reasoned Multi-Task Learning Model for Surgical Scene UnderstandingCode1
Bootstraping Clustering of Gaussians for View-consistent 3D Scene UnderstandingCode1
EndoChat: Grounded Multimodal Large Language Model for Endoscopic SurgeryCode1
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal EstimationCode1
Efficient Multi-Task RGB-D Scene Analysis for Indoor EnvironmentsCode1
Show:102550
← PrevPage 4 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified