SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 476500 of 1723 papers

TitleStatusHype
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban ScenariosCode1
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point CloudCode1
VideoNavQA: Bridging the Gap between Visual and Embodied Question AnsweringCode1
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape ReconstructionCode1
Challenges for Monocular 6D Object Pose Estimation in Robotics0
ArK: Augmented Reality with Knowledge Interactive Emergent Ability0
Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models0
Adversarial Attacks on Monocular Depth Estimation0
Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets0
3D Vision-Language Gaussian Splatting0
Category-Level and Open-Set Object Pose Estimation for Robotics0
Evaluation of Multimodal Semantic Segmentation using RGB-D Data0
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL)0
A Review on Visual-SLAM: Advancements from Geometric Modelling to Learning-based Semantic Scene Understanding0
GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation0
Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy0
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users0
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection0
CASPNet++: Joint Multi-Agent Motion Prediction0
GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games0
Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks0
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios0
Event fields: Capturing light fields at high speed, resolution, and dynamic range0
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond0
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding0
Show:102550
← PrevPage 20 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified