SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 301350 of 1723 papers

TitleStatusHype
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense KnowledgeCode1
Collaborative Transformers for Grounded Situation RecognitionCode1
Efficient Multi-Task RGB-D Scene Analysis for Indoor EnvironmentsCode1
Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph AnalysisCode1
Event-aided Semantic Scene CompletionCode1
Event-based Motion Segmentation with Spatio-Temporal Graph CutsCode1
A Two-Stage Masked Autoencoder Based Network for Indoor Depth CompletionCode1
Context Prior for Scene SegmentationCode1
Dual-Hybrid Attention Network for Specular Highlight RemovalCode1
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene ContextsCode1
Dynamic Graph Message Passing NetworksCode1
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous DrivingCode1
NODIS: Neural Ordinary Differential Scene UnderstandingCode1
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen RepresentationsCode1
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World EnvironmentsCode1
A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed ImagesCode1
Estimating Generic 3D Room Structures from 2D AnnotationsCode1
Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree FarmsCode1
ODAM: Object Detection, Association, and Mapping using Posed RGB VideoCode1
MSeg: A Composite Dataset for Multi-domain Semantic SegmentationCode1
AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D ScansCode1
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene UnderstandingCode1
A Survey on Deep Learning Technique for Video SegmentationCode1
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based DecodersCode1
Automatic Extrinsic Calibration Method for LiDAR and Camera Sensor SetupsCode1
4D Panoptic LiDAR SegmentationCode1
DPF: Learning Dense Prediction Fields with Weak SupervisionCode1
Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene UnderstandingCode1
MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based DecodersCode1
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object DetectionCode1
All-Day Multi-Camera Multi-Target TrackingCode1
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic SurgeryCode1
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine IntelligenceCode1
DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image FusionCode1
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point CloudCode1
A Survey of World Models for Autonomous DrivingCode1
Affect2MM: Affective Analysis of Multimedia Content Using Emotion CausalityCode1
Monte Carlo Scene Search for 3D Scene UnderstandingCode1
Distilled Semantics for Comprehensive Scene Understanding from VideosCode1
DIP: Unsupervised Dense In-Context Post-training of Visual RepresentationsCode1
AeroRIT: A New Scene for Hyperspectral Image AnalysisCode1
General Geometry-aware Weakly Supervised 3D Object DetectionCode1
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object DetectionCode1
Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce ModelCode1
Digging Into Self-Supervised Monocular Depth EstimationCode1
GFF: Gated Fully Fusion for Semantic SegmentationCode1
Global-Reasoned Multi-Task Learning Model for Surgical Scene UnderstandingCode1
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map ConstructionCode1
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIPCode1
Comprehensive Visual Question Answering on Point Clouds through Compositional Scene ManipulationCode1
Show:102550
← PrevPage 7 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified