SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 676700 of 1723 papers

TitleStatusHype
TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene UnderstandingCode1
NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph EnrichmentCode1
Continual Learning of Unsupervised Monocular Depth from VideosCode0
Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation0
Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture0
TPSeNCE: Towards Artifact-Free Realistic Rain Generation for Deraining and Object Detection in RainCode1
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation0
P2AT: Pyramid Pooling Axial Transformer for Real-time Semantic SegmentationCode0
Panoptic Out-of-Distribution Segmentation0
S4C: Self-Supervised Semantic Scene Completion with Neural Fields0
DualMLP: a two-stream fusion model for 3D point cloud classificationCode0
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models0
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions0
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous DrivingCode1
TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic SegmentationCode1
Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception0
Adaptive Visual Scene Understanding: Incremental Scene Graph GenerationCode0
Logical Bias Learning for Object Relation Prediction0
SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction0
Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree FarmsCode1
Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding0
PanopticNDT: Efficient and Robust Panoptic MappingCode1
SANPO: A Scene Understanding, Accessibility and Human Navigation Dataset0
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models0
Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives0
Show:102550
← PrevPage 28 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified