SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 851900 of 1723 papers

TitleStatusHype
Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts, Datasets and Metrics0
CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP0
Traffic Scene Parsing through the TSP6K DatasetCode1
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media ReasoningCode0
Unified Perception: Efficient Depth-Aware Video Panoptic Segmentation with Minimal Annotation Costs0
Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning0
APARATE: Adaptive Adversarial Patch for CNN-based Monocular Depth Estimation for Autonomous Navigation0
Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors0
RemoteNet: Remote Sensing Image Segmentation Network based on Global-Local Information0
Open Challenges for Monocular Single-shot 6D Object Pose Estimation0
CEKD: Cross-Modal Edge-Privileged Knowledge Distillation for Semantic Scene Understanding Using Only Thermal ImagesCode1
Deep Learning for Event-based Vision: A Comprehensive Survey and BenchmarksCode1
Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection0
3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose EstimationCode1
Object-Centric Scene Representations using Active Inference0
Structured Generative Models for Scene Understanding0
A Flexible Framework for Virtual Omnidirectional Vision to Improve Operator Situation Awareness0
GALIP: Generative Adversarial CLIPs for Text-to-Image SynthesisCode2
Learning from Mistakes: Self-Regularizing Hierarchical Representations in Point Cloud Semantic Segmentation0
OvarNet: Towards Open-vocabulary Object Attribute RecognitionCode1
Unleash the Potential of Image Branch for Cross-modal 3D Object DetectionCode1
Model-based inexact graph matching on top of CNNs for semantic scene understandingCode0
Long Range Pooling for 3D Large-Scale Scene Understanding0
Diffusion-based Generation, Optimization, and Planning in 3D ScenesCode2
A Comprehensive Review of Modern Object Segmentation Approaches0
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIPCode1
Neural Radiance Field CodebooksCode0
Plausible Uncertainties for Human Pose Regression0
Visual Traffic Knowledge Graph Generation from Scene Images0
RealGraph: A Multiview Dataset for 4D Real-world Context Graph Generation0
Self-Supervised Object Detection from Egocentric Videos0
Uni-3D: A Universal Model for Panoptic 3D Scene ReconstructionCode1
Seeing With Sound: Long-range Acoustic Beamforming for Multimodal Scene Understanding0
Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs0
Combining Implicit-Explicit View Correlation for Light Field Semantic Segmentation0
PeakConv: Learning Peak Receptive Field for Radar Semantic SegmentationCode1
Attentional Graph Convolutional Network for Structure-aware Audio-Visual Scene Classification0
PointVST: Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image TranslationCode1
Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene UnderstandingCode0
METEOR Guided Divergence for Video CaptioningCode0
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency0
Panoptic Lifting for 3D Scene Understanding with Neural FieldsCode2
Learning Object-level Point Augmentor for Semi-supervised 3D Object DetectionCode1
Lightweight integration of 3D features to improve 2D image segmentationCode0
Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation0
Cross-Domain Synthetic-to-Real In-the-Wild Depth and Normal Estimation for 3D Scene Understanding0
Towards Holistic Surgical Scene UnderstandingCode1
LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for Autonomous DrivingCode1
Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data0
Framework for 2D Ad placements in LinearTV0
Show:102550
← PrevPage 18 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified