SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 426450 of 1723 papers

TitleStatusHype
MonoDistill: Learning Spatial Features for Monocular 3D Object DetectionCode1
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic SegmentationCode1
MSeg: A Composite Dataset for Multi-domain Semantic SegmentationCode1
Explainable Object-induced Action Decision for Autonomous VehiclesCode1
TextSLAM: Visual SLAM with Planar Text FeaturesCode1
OK-VQA: A Visual Question Answering Benchmark Requiring External KnowledgeCode1
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question AnsweringCode1
Multi3DRefer: Grounding Text Description to Multiple 3D ObjectsCode1
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map ConstructionCode1
Panoptic 3D Scene Reconstruction From a Single RGB ImageCode1
Dual-Hybrid Attention Network for Specular Highlight RemovalCode1
Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree FarmsCode1
Egocentric Scene Understanding via Multimodal Spatial RectifierCode1
Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene UnderstandingCode1
Efficient Multi-Task RGB-D Scene Analysis for Indoor EnvironmentsCode1
Dynamic Graph Message Passing Networks for Visual RecognitionCode1
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation ModelsCode1
Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph AnalysisCode1
Dynamic Scene Understanding through Object-Centric Voxelization and Neural RenderingCode1
Multi-Scale Attention for Audio Question AnsweringCode1
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture RecognitionCode1
P2T: Pyramid Pooling Transformer for Scene UnderstandingCode1
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D DataCode1
ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic SegmentationCode1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene UnderstandingCode1
Show:102550
← PrevPage 18 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified