SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 15011525 of 1723 papers

TitleStatusHype
Taskology: Utilizing Task Relations at Scale0
Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving0
TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances0
Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding0
Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction0
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications0
Temporal Propagation of Asymmetric Feature Pyramid for Surgical Scene Segmentation0
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models0
Application of Multimodal Large Language Models in Autonomous Driving0
Test-Time Adaptation for Nighttime Color-Thermal Semantic Segmentation0
Test-Time Intensity Consistency Adaptation for Shadow Detection0
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions0
A pooling based scene text proposal technique for scene text reading in the wild0
APARATE: Adaptive Adversarial Patch for CNN-based Monocular Depth Estimation for Autonomous Navigation0
Text-to-Image GAN with Pretrained Representations0
Anticipating Object State Changes in Long Procedural Videos0
Texture Underfitting for Domain Adaptation0
TGOSPA Metric Parameters Selection and Evaluation for Visual Multi-object Tracking0
TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness0
What Can I Do Around Here? Deep Functional Scene Understanding for Cognitive Robots0
Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions0
Answerability Fields: Answerable Location Estimation via Diffusion Models0
The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation0
The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes0
An Intelligent Safety System for Human-Centered Semi-Autonomous Vehicles0
Show:102550
← PrevPage 61 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified