SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 401450 of 1723 papers

TitleStatusHype
Relation-aware Instance Refinement for Weakly Supervised Visual GroundingCode1
Comprehensive Visual Question Answering on Point Clouds through Compositional Scene ManipulationCode1
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic FieldsCode1
ReorientBot: Learning Object Reorientation for Specific-Posed PlacementCode1
RescueNet: A High Resolution UAV Semantic Segmentation Benchmark Dataset for Natural Disaster Damage AssessmentCode1
RfD-Net: Point Scene Understanding by Semantic Instance ReconstructionCode1
Grounded Situation Recognition with TransformersCode1
Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report GenerationCode1
ROOT: VLM based System for Indoor Scene Understanding and BeyondCode1
Distilled Semantics for Comprehensive Scene Understanding from VideosCode1
Generating Visual Spatial Description via Holistic 3D Scene UnderstandingCode1
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language ModelsCode1
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object DetectionCode1
SaccadeNet: A Fast and Accurate Object DetectorCode1
General Geometry-aware Weakly Supervised 3D Object DetectionCode1
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation ModelCode1
Bootstraping Clustering of Gaussians for View-consistent 3D Scene UnderstandingCode1
Explainable Object-induced Action Decision for Autonomous VehiclesCode1
GFF: Gated Fully Fusion for Semantic SegmentationCode1
SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D SequencesCode1
SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Multiple EnvironmentsCode1
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image SegmentationCode1
F-ViTA: Foundation Model Guided Visible to Thermal TranslationCode1
DPF: Learning Dense Prediction Fields with Weak SupervisionCode1
Boundary-induced and scene-aggregated network for monocular depth predictionCode1
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language ModelsCode1
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object DetectionCode1
Semantic Segmentation-Assisted Instance Feature Fusion for Multi-Level 3D Part Instance SegmentationCode1
From General to Specific: Informative Scene Graph Generation via Balance AdjustmentCode1
SemSegDepth: A Combined Model for Semantic Segmentation and Depth CompletionCode1
Global Aggregation then Local Distribution in Fully Convolutional NetworksCode1
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous DrivingCode1
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene UnderstandingCode1
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map ConstructionCode1
Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene UnderstandingCode1
Dual-Hybrid Attention Network for Specular Highlight RemovalCode1
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous DrivingCode1
Spatio-temporal Self-Supervised Representation Learning for 3D Point CloudsCode1
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and ReasoningCode1
Dynamic Graph Message Passing NetworksCode1
Dynamic Graph Message Passing Networks for Visual RecognitionCode1
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation ModelsCode1
A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed ImagesCode1
Few-Shot Object Detection and Viewpoint Estimation for Objects in the WildCode1
Stealing Stable Diffusion Prior for Robust Monocular Depth EstimationCode1
FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud SegmentationCode1
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D DataCode1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene UnderstandingCode1
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth EstimationCode1
FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier ConvolutionsCode1
Show:102550
← PrevPage 9 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified