SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 9511000 of 1723 papers

TitleStatusHype
Robust Multi-Modal Image Stitching for Improved Scene Understanding0
Cloud-Device Collaborative Learning for Multimodal Large Language Models0
BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-task Dense Predictions0
Object Attribute Matters in Visual Question AnsweringCode0
AccidentGPT: Accident Analysis and Prevention from V2X Environmental Perception with Multi-modal Large Model0
Language-Assisted 3D Scene Understanding0
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment0
Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis0
Zoom in on the Plant: Fine-grained Analysis of Leaf, Stem and Vein InstancesCode0
VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding0
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge TransferCode0
Spatiotemporal Event Graphs for Dynamic Scene Understanding0
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection0
SkyScenes: A Synthetic Dataset for Aerial Scene Understanding0
Prospective Role of Foundation Models in Advancing Autonomous Vehicles0
IGFNet: Illumination-Guided Fusion Network for Semantic Scene Understanding using RGB-Thermal ImagesCode0
A Review and A Robust Framework of Data-Efficient 3D Scene Parsing with Traditional/Learned 3D Descriptors0
Segment Any 3D Gaussians0
HAtt-Flow: Hierarchical Attention-Flow Mechanism for Group Activity Scene Graph Generation in Videos0
Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames0
REACT: Recognize Every Action Everywhere All At Once0
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding0
Multi-task Planar Reconstruction with Feature Warping GuidanceCode0
GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior Prediction0
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding0
SeaDSC: A video-based unsupervised method for dynamic scene change detection in unmanned surface vehicles0
Two Stream Scene Understanding on Graph Embedding0
Continual Learning of Unsupervised Monocular Depth from VideosCode0
Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation0
Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture0
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation0
P2AT: Pyramid Pooling Axial Transformer for Real-time Semantic SegmentationCode0
Panoptic Out-of-Distribution Segmentation0
S4C: Self-Supervised Semantic Scene Completion with Neural Fields0
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models0
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions0
DualMLP: a two-stream fusion model for 3D point cloud classificationCode0
Adaptive Visual Scene Understanding: Incremental Scene Graph GenerationCode0
Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception0
Logical Bias Learning for Object Relation Prediction0
SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction0
Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding0
SANPO: A Scene Understanding, Accessibility and Human Navigation Dataset0
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models0
Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives0
Shape Anchor Guided Holistic Indoor Scene UnderstandingCode0
PanoMixSwap Panorama Mixing via Structural Swapping for Indoor Scene Understanding0
So you think you can track?0
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning0
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving0
Show:102550
← PrevPage 20 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified