SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 101150 of 1723 papers

TitleStatusHype
Towards Generating Realistic 3D Semantic Training Data for Autonomous DrivingCode2
Towards Open Vocabulary Learning: A SurveyCode2
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene UnderstandingCode2
VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo AlignmentCode2
Volumetric Environment Representation for Vision-Language NavigationCode2
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous DrivingCode2
HAKE: A Knowledge Engine Foundation for Human Activity UnderstandingCode2
Grounded 3D-LLM with Referent TokensCode2
ARKit LabelMaker: A New Scale for Indoor 3D Scene UnderstandingCode2
Feed-Forward SceneDINO for Unsupervised Semantic Scene CompletionCode2
GroupViT: Semantic Segmentation Emerges from Text SupervisionCode2
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction TuningCode2
NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLMCode2
RelationField: Relate Anything in Radiance FieldsCode2
Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene UnderstandingCode2
Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian SplattingCode2
Deep Learning for Event-based Vision: A Comprehensive Survey and BenchmarksCode1
Generating Visual Spatial Description via Holistic 3D Scene UnderstandingCode1
GFF: Gated Fully Fusion for Semantic SegmentationCode1
3DMIT: 3D Multi-modal Instruction Tuning for Scene UnderstandingCode1
A Review of Panoptic Segmentation for Mobile Mapping Point CloudsCode1
Advances in Deep Concealed Scene UnderstandingCode1
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based OptimizationCode1
General Geometry-aware Weakly Supervised 3D Object DetectionCode1
Global Aggregation then Local Distribution in Fully Convolutional NetworksCode1
Deep learning for radar data exploitation of autonomous vehicleCode1
DC-SAM: In-Context Segment Anything in Images and Videos via Dual ConsistencyCode1
F-ViTA: Foundation Model Guided Visible to Thermal TranslationCode1
DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image FusionCode1
Arabic Scene Text Recognition in the Deep Learning Era: Analysis on A Novel DatasetCode1
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object DetectionCode1
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving ScenesCode1
FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier ConvolutionsCode1
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion FramesCode1
Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene UnderstandingCode1
DeepScores -- A Dataset for Segmentation, Detection and Classification of Tiny ObjectsCode1
From General to Specific: Informative Scene Graph Generation via Balance AdjustmentCode1
Global-Reasoned Multi-Task Learning Model for Surgical Scene UnderstandingCode1
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World EnvironmentsCode1
Few-Shot Object Detection and Viewpoint Estimation for Objects in the WildCode1
CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic SegmentationCode1
OK-VQA: A Visual Question Answering Benchmark Requiring External KnowledgeCode1
A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed ImagesCode1
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene UnderstandingCode1
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene ContextsCode1
A Data-Centric Revisit of Pre-Trained Vision Models for Robot LearningCode1
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and ReasoningCode1
Context Prior for Scene SegmentationCode1
3DRM:Pair-wise relation module for 3D object detectionCode1
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous DrivingCode1
Show:102550
← PrevPage 3 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified