SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 801850 of 1723 papers

TitleStatusHype
Transavs: End-To-End Audio-Visual Segmentation With Transformer0
Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and BeyondCode1
Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs0
Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding0
Living in a Material World: Learning Material Properties from Full-Waveform Flash Lidar Data for Semantic Segmentation0
Learning-based Relational Object Matching Across Views0
ArK: Augmented Reality with Knowledge Interactive Emergent Ability0
TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene UnderstandingCode2
DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric VoxelizationCode1
Neural Implicit Dense Semantic SLAM0
A Review of Panoptic Segmentation for Mobile Mapping Point CloudsCode1
Compositional 3D Human-Object Neural Animation0
ZRG: A Dataset for Multimodal 3D Residential Rooftop Understanding0
RGB-D Indiscernible Object Counting in Underwater ScenesCode1
Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic SegmentationCode1
Advances in Deep Concealed Scene UnderstandingCode1
Factored Neural Representation for Scene Understanding0
RS2G: Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception and Scenario UnderstandingCode1
360^ High-Resolution Depth Estimation via Uncertainty-aware Structural Knowledge Transfer0
Learning How To Robustly Estimate Camera Pose in Endoscopic VideosCode1
STRAP: Structured Object Affordance Segmentation with Point SupervisionCode1
ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction DetectionCode1
Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene UnderstandingCode2
iDisc: Internal Discretization for Monocular Depth EstimationCode3
Graph-based Topology Reasoning for Driving ScenesCode2
Semantic Segmentation with High Inference Speed in Off-Road EnvironmentsCode0
Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation0
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene UnderstandingCode0
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene UnderstandingCode2
Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings0
Complementary Random Masking for RGB-Thermal Semantic SegmentationCode1
DPF: Learning Dense Prediction Fields with Weak SupervisionCode1
HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph GenerationCode1
Real-Time Semantic Segmentation using Hyperspectral Images for Mapping Unstructured and Unknown EnvironmentsCode1
You Only Need One Thing One Click: Self-Training for Weakly Supervised 3D Scene UnderstandingCode1
Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation0
Viewpoint Equivariance for Multi-View 3D Object DetectionCode1
OVeNet: Offset Vector Network for Semantic SegmentationCode0
Self-distillation for surgical action recognitionCode1
Uni-Fusion: Universal Continuous Mapping0
Semantic segmentation of surgical hyperspectral images under geometric domain shifts0
Constructing Metric-Semantic Maps using Floor Plan Priors for Long-Term Indoor LocalizationCode1
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D RecognitionCode2
Content Adaptive Front End For Audio Classification0
Efficient Computation Sharing for Multi-Task Visual Scene UnderstandingCode0
Shifted-Windows Transformers for the Detection of Cerebral Aneurysms in Microsurgery0
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous DrivingCode3
PENet: A Joint Panoptic Edge Detection NetworkCode0
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object DetectionCode1
Generalized 3D Self-supervised Learning Framework via Prompted Foreground-Aware Feature Contrast0
Show:102550
← PrevPage 17 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified