SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 851900 of 1723 papers

TitleStatusHype
Dynamic Scene Understanding from Vision-Language Representations0
MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents0
Making Large Language Models Better Planners with Reasoning-Decision Alignment0
Manhattan Scene Understanding via XSlit Imaging0
Dynamic Interaction-Aware Scene Understanding for Reinforcement Learning in Autonomous Driving0
Mapping High-level Semantic Regions in Indoor Environments without Object Recognition0
MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report0
Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors0
Dynamic Clustering Transformer Network for Point Cloud Segmentation0
MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation0
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding0
DublinCity: Annotated LiDAR Point Cloud and its Applications0
DSNet: An Efficient CNN for Road Scene Segmentation0
Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement0
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction0
DSM: Building A Diverse Semantic Map for 3D Visual Grounding0
Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry0
Meta Learning with Differentiable Closed-form Solver for Fast Video Object Segmentation0
MetaMorphosis: Task-oriented Privacy Cognizant Feature Generation for Multi-task Learning0
Active Scene Understanding via Online Semantic Reconstruction0
A Continuous Occlusion Model for Road Scene Understanding0
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration0
Unified Perception: Efficient Depth-Aware Video Panoptic Segmentation with Minimal Annotation Costs0
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving0
Minimal Adversarial Examples for Deep Learning on 3D Point Clouds0
Mining Conditional Part Semantics with Occluded Extrapolation for Human-Object Interaction Detection0
Content Adaptive Front End For Audio Classification0
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models0
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation0
Unified Representation Space for 3D Visual Grounding0
Unified Scene Representation and Reconstruction for 3D Large Language Models0
DriveGuard: Robustification of Automated Driving Systems with Deep Spatio-Temporal Convolutional Autoencoder0
MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements0
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency0
MNEW: Multi-domain Neighborhood Embedding and Weighting for Sparse Point Clouds Segmentation0
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving0
Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding0
DreamAnywhere: Object-Centric Panoramic 3D Scene Generation0
Uni-Fusion: Universal Continuous Mapping0
UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations0
Modeling human intuitions about liquid flow with particle-based simulation0
Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting0
DORSal: Diffusion for Object-centric Representations of Scenes et al0
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation0
A Comprehensive Review of Modern Object Segmentation Approaches0
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection0
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs0
Monocular Depth Estimation with Sharp Boundary0
Does CLIP perceive art the same way we do?0
MonoGRNet: A General Framework for Monocular 3D Object Detection0
Show:102550
← PrevPage 18 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified