SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 451500 of 1723 papers

TitleStatusHype
General Geometry-aware Weakly Supervised 3D Object DetectionCode1
Generating Visual Spatial Description via Holistic 3D Scene UnderstandingCode1
Towards Holistic Surgical Scene UnderstandingCode1
Towards In-context Scene UnderstandingCode1
Efficient Multi-Task RGB-D Scene Analysis for Indoor EnvironmentsCode1
CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D NetworksCode1
Global Aggregation then Local Distribution in Fully Convolutional NetworksCode1
Towards Scene Understanding for Autonomous Operations on Airport ApronsCode1
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D DataCode1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene UnderstandingCode1
CamContextI2V: Context-aware Controllable Video GenerationCode1
Traffic Scene Parsing through the TSP6K DatasetCode1
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth EstimationCode1
Explainable Object-induced Action Decision for Autonomous VehiclesCode1
Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera IntrinsicsCode1
Global-Reasoned Multi-Task Learning Model for Surgical Scene UnderstandingCode1
TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene UnderstandingCode1
Uncertainty-aware Panoptic SegmentationCode1
Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor SceneCode1
Understanding Bird's-Eye View of Road Semantics using an Onboard CameraCode1
Holistic 3D Scene Understanding from a Single Image with Implicit RepresentationCode1
Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic SegmentationCode1
UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature RepresentationCode1
Unleash the Potential of Image Branch for Cross-modal 3D Object DetectionCode1
EndoChat: Grounded Multimodal Large Language Model for Endoscopic SurgeryCode1
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban ScenariosCode1
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point CloudCode1
VideoNavQA: Bridging the Gap between Visual and Embodied Question AnsweringCode1
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape ReconstructionCode1
Challenges for Monocular 6D Object Pose Estimation in Robotics0
ArK: Augmented Reality with Knowledge Interactive Emergent Ability0
Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models0
Adversarial Attacks on Monocular Depth Estimation0
Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets0
3D Vision-Language Gaussian Splatting0
Category-Level and Open-Set Object Pose Estimation for Robotics0
Evaluation of Multimodal Semantic Segmentation using RGB-D Data0
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL)0
A Review on Visual-SLAM: Advancements from Geometric Modelling to Learning-based Semantic Scene Understanding0
GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation0
Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy0
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users0
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection0
CASPNet++: Joint Multi-Agent Motion Prediction0
GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games0
Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks0
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios0
Event fields: Capturing light fields at high speed, resolution, and dynamic range0
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond0
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding0
Show:102550
← PrevPage 10 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified