SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 801850 of 1723 papers

TitleStatusHype
Deep ensembles based on Stochastic Activation Selection for Polyp Segmentation0
A Multiple-View Geometric Model for Specularity Prediction on General Curved Surfaces0
Learning to Detect Human-Object Interactions With Knowledge0
Learning to Exploit Stability for 3D Scene Parsing0
Learning to Interpret and Describe Abstract Scenes0
Multimodal 3D Object Detection on Unseen Domains0
IMENet: Joint 3D Semantic Scene Completion and 2D Semantic Segmentation through Iterative Mutual Enhancement0
Image-to-Height Domain Translation for Synthetic Aperture Sonar0
Deep cross-domain building extraction for selective depth estimation from oblique aerial imagery0
Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction0
Discovery of Shared Semantic Spaces for Multi-Scene Video Query and Summarization0
An Exemplar-based CRF for Multi-instance Object Segmentation0
Leveraging Auxiliary Text for Deep Recognition of Unseen Visual Relationships0
Image Segmentation with Large Language Models: A Survey with Perspectives for Intelligent Transportation Systems0
Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation0
A Comprehensive Review of Modern Object Segmentation Approaches0
Image Parsing with Stochastic Scene Grammar0
Deep Contextual Attention for Human-Object Interaction Detection0
Lifting GIS Maps into Strong Geometric Context for Scene Understanding0
DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding0
Image-Graph-Image Translation via Auto-Encoding0
A model of saliency-based visual attention for rapid scene analysis0
Multimodal 3D Reasoning Segmentation with Complex Scenes0
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment0
Living in a Material World: Learning Material Properties from Full-Waveform Flash Lidar Data for Semantic Segmentation0
Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding0
Deep Bayesian Image Set Classification: A Defence Approach against Adversarial Attacks0
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness0
LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding0
IM2CAD0
A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features0
MTANet: Multitask-Aware Network With Hierarchical Multimodal Fusion for RGB-T Urban Scene Understanding0
AVD2: Accident Video Diffusion for Accident Video Description0
Moving Object Proposals with Deep Learned Optical Flow for Video Object Segmentation0
Identifying First-person Camera Wearers in Third-person Videos0
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving0
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation0
Long Range Pooling for 3D Large-Scale Scene Understanding0
DAWN: Vehicle Detection in Adverse Weather Nature Dataset0
MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors0
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs0
Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation0
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation0
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding0
Data-Driven Scene Understanding with Adaptively Retrieved Exemplars0
LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition0
A Variational Observation Model of 3D Object for Probabilistic Semantic SLAM0
Movies2Scenes: Using Movie Metadata to Learn Scene Representation0
HyKo: A Spectral Dataset for Scene Understanding0
A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding0
Show:102550
← PrevPage 17 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified