SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 751800 of 1723 papers

TitleStatusHype
3D Vision-Language Gaussian Splatting0
Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy0
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users0
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection0
CASPNet++: Joint Multi-Agent Motion Prediction0
Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks0
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding0
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios0
Cascaded Classification Models: Combining Models for Holistic Scene Understanding0
ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation0
Car Segmentation and Pose Estimation using 3D Object Models0
Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors0
A Review and A Robust Framework of Data-Efficient 3D Scene Parsing with Traditional/Learned 3D Descriptors0
Enhancing image captioning with depth information using a Transformer-based framework0
Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning0
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving0
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding0
Multilateral Cascading Network for Semantic Segmentation of Large-Scale Outdoor Point Clouds0
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps0
A Reinforcement Learning Framework for Natural Question Generation using Bi-discriminators0
Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection0
3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing0
End-to-End Race Driving with Deep Reinforcement Learning0
End-to-end Autonomous Driving using Deep Learning: A Systematic Review0
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision0
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind0
A Reinforcement Learning Approach to Target Tracking in a Camera Network0
Empowering Large Language Models with 3D Situation Awareness0
Empowering cyberphysical systems of systems with intelligence0
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?0
EML-NET:An Expandable Multi-Layer NETwork for Saliency Prediction0
Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery0
A Reflectance Based Method For Shadow Detection and Removal0
A diffusion and clustering-based approach for finding coherent motions and understanding crowd scenes0
Embracing Diffraction: A Paradigm Shift in Wireless Sensing and Communication0
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments0
Embodied Visual Active Learning for Semantic Segmentation0
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding0
Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts, Datasets and Metrics0
Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects0
Embodied Scene Understanding for Vision Language Models via MetaVQA0
Camera-Only Bird's Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles0
Camera Control at the Edge with Language Models for Scene Understanding0
Addressing the Sim2Real Gap in Robotic 3D Object Classification0
3D Shape Augmentation with Content-Aware Shape Resizing0
Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception0
EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting0
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting0
Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation0
Show:102550
← PrevPage 16 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified