SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 501525 of 1723 papers

TitleStatusHype
CASPNet++: Joint Multi-Agent Motion Prediction0
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail0
Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models0
Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception0
Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks0
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios0
Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection0
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding0
Cascaded Classification Models: Combining Models for Holistic Scene Understanding0
Exploiting Temporal Coherence for Multi-modal Video Categorization0
Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection0
ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation0
Car Segmentation and Pose Estimation using 3D Object Models0
Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors0
A Review and A Robust Framework of Data-Efficient 3D Scene Parsing with Traditional/Learned 3D Descriptors0
A Reinforcement Learning Framework for Natural Question Generation using Bi-discriminators0
Enhancing image captioning with depth information using a Transformer-based framework0
Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning0
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving0
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding0
Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds0
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps0
3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing0
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding0
GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior Prediction0
Show:102550
← PrevPage 21 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified