SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 401425 of 1723 papers

TitleStatusHype
LoLI-Street: Benchmarking Low-Light Image Enhancement and BeyondCode1
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene UnderstandingCode1
Dynamic Scene Understanding through Object-Centric Voxelization and Neural RenderingCode1
SemSegDepth: A Combined Model for Semantic Segmentation and Depth CompletionCode1
Online panoptic 3D reconstruction as a Linear Assignment ProblemCode1
Dual-Hybrid Attention Network for Specular Highlight RemovalCode1
Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene UnderstandingCode1
Grounded Situation Recognition with TransformersCode1
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map ConstructionCode1
Distilled Semantics for Comprehensive Scene Understanding from VideosCode1
BoMuDANet: Unsupervised Adaptation for Visual Scene Understanding in Unstructured Driving EnvironmentsCode1
Mask4D: End-to-End Mask-Based 4D Panoptic Segmentation for LiDAR SequencesCode1
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object DetectionCode1
MassMIND: Massachusetts Maritime INfrared DatasetCode1
Dynamic Graph Message Passing NetworksCode1
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation ModelCode1
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense KnowledgeCode1
Monte Carlo Scene Search for 3D Scene UnderstandingCode1
You Only Need One Thing One Click: Self-Training for Weakly Supervised 3D Scene UnderstandingCode1
MGNet: Monocular Geometric Scene Understanding for Autonomous DrivingCode1
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D DataCode1
Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy PredictionCode1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene UnderstandingCode1
DPF: Learning Dense Prediction Fields with Weak SupervisionCode1
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth EstimationCode1
Show:102550
← PrevPage 17 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified