SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 701750 of 1723 papers

TitleStatusHype
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention UnderstandingCode0
LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual SemanticsCode0
Lightweight integration of 3D features to improve 2D image segmentationCode0
Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and MICCAI FetReg2021 Challenge FindingsCode0
Leveraging Acoustic Images for Effective Self-Supervised Audio Representation LearningCode0
Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene UnderstandingCode0
Fast Scene Understanding for Autonomous DrivingCode0
Artificial Color Constancy via GoogLeNet with Angular Loss FunctionCode0
Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field EstimationCode0
Learning Panoptic Segmentation from Instance ContoursCode0
CLAIR-A: Leveraging Large Language Models to Judge Audio CaptionsCode0
False Negative Reduction in Video Instance Segmentation using Uncertainty EstimatesCode0
Implicit Background Estimation for Semantic SegmentationCode0
Learning Regional Purity for Instance Segmentation on 3D Point CloudsCode0
Learning Monocular Depth by Distilling Cross-domain Stereo NetworksCode0
Facing the Void: Overcoming Missing Data in Multi-View ImageryCode0
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry PriorsCode0
Extremely Fine-Grained Visual Classification over Resembling Glyphs in the WildCode0
Adversarial Attacks on Monocular Pose EstimationCode0
Language-based Colorization of Scene SketchesCode0
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic SegmentationCode0
Knowledge-Guided Object Discovery with Acquired Deep ImpressionsCode0
Label-Attention Transformer with Geometrically Coherent Objects for Image CaptioningCode0
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action RecognitionCode0
Monocular 3D Object Detection with Pseudo-LiDAR Point CloudCode0
Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a SupercomputerCode0
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive SegmentationCode0
Interpretable Visual Understanding with Cognitive Attention NetworkCode0
P2AT: Pyramid Pooling Axial Transformer for Real-time Semantic SegmentationCode0
Single Image 3D Object Estimation with Primitive Graph NetworksCode0
Exploiting Temporal Coherence for Multi-modal Video Categorization0
Exploiting High Level Scene Cues in Stereo Reconstruction0
Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection0
Challenges for Monocular 6D Object Pose Estimation in Robotics0
ArK: Augmented Reality with Knowledge Interactive Emergent Ability0
Explainable Scene Understanding with Qualitative Representations and Graph Neural Networks0
Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception0
Exosense: A Vision-Based Scene Understanding System For Exoskeletons0
Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models0
Adversarial Attacks on Monocular Depth Estimation0
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail0
EvSegSNN: Neuromorphic Semantic Segmentation for Event Data0
EvidMTL: Evidential Multi-Task Learning for Uncertainty-Aware Semantic Surface Mapping from Monocular RGB Images0
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond0
Event fields: Capturing light fields at high speed, resolution, and dynamic range0
Category-Level and Open-Set Object Pose Estimation for Robotics0
Evaluation of Multimodal Semantic Segmentation using RGB-D Data0
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL)0
A Review on Visual-SLAM: Advancements from Geometric Modelling to Learning-based Semantic Scene Understanding0
Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets0
Show:102550
← PrevPage 15 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified