SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 9511000 of 1723 papers

TitleStatusHype
Semantic Segmentation-Assisted Instance Feature Fusion for Multi-Level 3D Part Instance SegmentationCode1
TAG: Boosting Text-VQA via Text-aware Visual Question-answer GenerationCode1
AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy0
Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion TransformerCode2
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point CloudCode1
CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous DrivingCode1
CompNVS: Novel View Synthesis with Scene Completion0
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language ModelsCode1
Panoptic Scene Graph GenerationCode2
Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise BinarizationCode1
Neural Groundplans: Persistent Neural Scene Representations from a Single Image0
SeasoNet: A Seasonal Scene Classification, segmentation and Retrieval dataset for satellite Imagery over Germany0
Egocentric Scene Understanding via Multimodal Spatial RectifierCode1
Adversarial Attacks on Monocular Pose EstimationCode0
Efficient Multi-Task RGB-D Scene Analysis for Indoor EnvironmentsCode1
BlindSpotNet: Seeing Where We Cannot See0
MCTS with Refinement for Proposals Selection Games in Scene UnderstandingCode1
Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases0
Distance Matters in Human-Object Interaction DetectionCode0
Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation0
Uncertainty-aware Panoptic SegmentationCode1
MGNet: Monocular Geometric Scene Understanding for Autonomous DrivingCode1
IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic EnvironmentsCode1
Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and MICCAI FetReg2021 Challenge FindingsCode0
Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for Mobile Agents via Unsupervised Contrastive LearningCode1
SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene UnderstandingCode0
A Dynamic Data Driven Approach for Explainable Scene Understanding0
On Efficient Real-Time Semantic Segmentation: A Survey0
Waymo Open Dataset: Panoramic Video Panoptic Segmentation0
A Multi-purpose Realistic Haze Benchmark with Quantifiable Haze Levels and Ground Truth0
Unsupervised Foggy Scene Understanding via Self Spatial-Temporal Label DiffusionCode0
Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields0
Extracting Zero-shot Common Sense from Large Language Models for Robot 3D Scene Understanding0
Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans0
A Memory System of a Robot Cognitive Architecture and its Implementation in ArmarX0
Towards Improving the Generation Quality of Autoregressive Slot VAEsCode0
SAMPLE-HD: Simultaneous Action and Motion Planning Learning Environment0
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and ReasoningCode1
Facing the Void: Overcoming Missing Data in Multi-View ImageryCode0
Review on Panoramic Imaging and Its Applications in Scene Understanding0
Unsupervised Discovery and Composition of Object Light Fields0
Neural Rendering in a Room: Amodal 3D Understanding and Free-Viewpoint Rendering for the Closed Scene Composed of Pre-Captured Objects0
RangeSeg: Range-Aware Real Time Segmentation of 3D LiDAR Point Clouds0
BBBD: Bounding Box Based Detector for Occlusion Detection and Order Recovery0
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text0
Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection0
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?0
Spatiality-guided Transformer for 3D Dense Captioning on Point CloudsCode1
SELMA: SEmantic Large-scale Multimodal Acquisitions in Variable Weather, Daytime and Viewpoints0
Attention Mechanism based Cognition-level Scene Understanding0
Show:102550
← PrevPage 20 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified