SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 251275 of 1723 papers

TitleStatusHype
STRAP: Structured Object Affordance Segmentation with Point SupervisionCode1
Learning How To Robustly Estimate Camera Pose in Endoscopic VideosCode1
RS2G: Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception and Scenario UnderstandingCode1
ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction DetectionCode1
Complementary Random Masking for RGB-Thermal Semantic SegmentationCode1
DPF: Learning Dense Prediction Fields with Weak SupervisionCode1
HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph GenerationCode1
Real-Time Semantic Segmentation using Hyperspectral Images for Mapping Unstructured and Unknown EnvironmentsCode1
You Only Need One Thing One Click: Self-Training for Weakly Supervised 3D Scene UnderstandingCode1
Viewpoint Equivariance for Multi-View 3D Object DetectionCode1
Self-distillation for surgical action recognitionCode1
Constructing Metric-Semantic Maps using Floor Plan Priors for Long-Term Indoor LocalizationCode1
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object DetectionCode1
Traffic Scene Parsing through the TSP6K DatasetCode1
CEKD: Cross-Modal Edge-Privileged Knowledge Distillation for Semantic Scene Understanding Using Only Thermal ImagesCode1
Deep Learning for Event-based Vision: A Comprehensive Survey and BenchmarksCode1
3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose EstimationCode1
OvarNet: Towards Open-vocabulary Object Attribute RecognitionCode1
Unleash the Potential of Image Branch for Cross-modal 3D Object DetectionCode1
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIPCode1
Uni-3D: A Universal Model for Panoptic 3D Scene ReconstructionCode1
PeakConv: Learning Peak Receptive Field for Radar Semantic SegmentationCode1
PointVST: Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image TranslationCode1
Learning Object-level Point Augmentor for Semi-supervised 3D Object DetectionCode1
Towards Holistic Surgical Scene UnderstandingCode1
Show:102550
← PrevPage 11 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified