SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 751800 of 1723 papers

TitleStatusHype
Deep Learned Full-3D Object Completion from Single View0
Improving Online Lane Graph Extraction by Object-Lane Clustering0
Dense Supervision Propagation for Weakly Supervised Semantic Segmentation on 3D Point Clouds0
Label-Driven Reconstruction for Domain Adaptation in Semantic Segmentation0
A Weakly-Supervised Depth Estimation Network Using Attention Mechanism0
Improving Human-Object Interaction Detection via Phrase Learning and Label Composition0
LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding0
Improving Building Segmentation for Off-Nadir Satellite Imagery0
Improving 6D Object Pose Estimation of metallic Household and Industry Objects0
Deep ensembles based on Stochastic Activation Selection for Polyp Segmentation0
A Multiple-View Geometric Model for Specularity Prediction on General Curved Surfaces0
Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry0
Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding0
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving0
Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Experiments, and Challenges0
Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New Paradigm0
IMENet: Joint 3D Semantic Scene Completion and 2D Semantic Segmentation through Iterative Mutual Enhancement0
Image-to-Height Domain Translation for Synthetic Aperture Sonar0
LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning0
Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment0
Deep cross-domain building extraction for selective depth estimation from oblique aerial imagery0
Learning 3D Robotics Perception using Inductive Priors0
Image Segmentation with Large Language Models: A Survey with Perspectives for Intelligent Transportation Systems0
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions0
A Comprehensive Review of Modern Object Segmentation Approaches0
Image Parsing with Stochastic Scene Grammar0
Learning-based 3D Reconstruction in Autonomous Driving: A Comprehensive Survey0
Learning-based Relational Object Matching Across Views0
Learning Category- and Instance-Aware Pixel Embedding for Fast Panoptic Segmentation0
Learning Densities in Feature Space for Reliable Segmentation of Indoor Scenes0
Learning Depth from Single Images with Deep Neural Network Embedding Focal Length0
Learning Direct Optimization for Scene Understanding0
Deep Contextual Attention for Human-Object Interaction Detection0
DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding0
Image-Graph-Image Translation via Auto-Encoding0
Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs0
A model of saliency-based visual attention for rapid scene analysis0
MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation0
Learning in Audio-visual Context: A Review, Analysis, and New Perspective0
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data0
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding0
Meta Learning with Differentiable Closed-form Solver for Fast Video Object Segmentation0
Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding0
Deep Bayesian Image Set Classification: A Defence Approach against Adversarial Attacks0
Mapping High-level Semantic Regions in Indoor Environments without Object Recognition0
IM2CAD0
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction0
3D Question Answering for City Scene Understanding0
A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features0
MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report0
Show:102550
← PrevPage 16 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified