SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 251300 of 1723 papers

TitleStatusHype
ALFWorld: Aligning Text and Embodied Environments for Interactive LearningCode1
Cerberus Transformer: Joint Semantic, Affordance and Attribute ParsingCode1
Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy PredictionCode1
MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene UnderstandingCode1
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point CloudCode1
Dual-Hybrid Attention Network for Specular Highlight RemovalCode1
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene UnderstandingCode1
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map ConstructionCode1
DPF: Learning Dense Prediction Fields with Weak SupervisionCode1
Mask4D: End-to-End Mask-Based 4D Panoptic Segmentation for LiDAR SequencesCode1
MassMIND: Massachusetts Maritime INfrared DatasetCode1
A Two-Stage Masked Autoencoder Based Network for Indoor Depth CompletionCode1
AirObject: A Temporally Evolving Graph Embedding for Object IdentificationCode1
Dynamic Graph Message Passing NetworksCode1
A Hybrid Sparse-Dense Monocular SLAM System for Autonomous DrivingCode1
M3D-RPN: Monocular 3D Region Proposal Network for Object DetectionCode1
MCTS with Refinement for Proposals Selection Games in Scene UnderstandingCode1
LoLI-Street: Benchmarking Low-Light Image Enhancement and BeyondCode1
Constructing Metric-Semantic Maps using Floor Plan Priors for Long-Term Indoor LocalizationCode1
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household RoboticsCode1
3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose EstimationCode1
Digging Into Self-Supervised Monocular Depth EstimationCode1
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene UnderstandingCode1
LinkNet: Exploiting Encoder Representations for Efficient Semantic SegmentationCode1
Affordance Transfer Learning for Human-Object Interaction DetectionCode1
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D EnvironmentsCode1
Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise BinarizationCode1
Dynamic Graph Message Passing Networks for Visual RecognitionCode1
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial RelationsCode1
LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for Autonomous DrivingCode1
MGNet: Monocular Geometric Scene Understanding for Autonomous DrivingCode1
Deep learning for radar data exploitation of autonomous vehicleCode1
A Survey on Deep Learning Technique for Video SegmentationCode1
LED: Light Enhanced Depth Estimation at NightCode1
4D Panoptic LiDAR SegmentationCode1
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based OptimizationCode1
Leveraging Large (Visual) Language Models for Robot 3D Scene UnderstandingCode1
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic SurgeryCode1
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine IntelligenceCode1
Collaborative Transformers for Grounded Situation RecognitionCode1
Deep Learning for Event-based Vision: A Comprehensive Survey and BenchmarksCode1
Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity CollaborationCode1
Complementary Random Masking for RGB-Thermal Semantic SegmentationCode1
Detecting Human-Object Interaction via Fabricated Compositional LearningCode1
A Survey of World Models for Autonomous DrivingCode1
Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object DetectionCode1
DIP: Unsupervised Dense In-Context Post-training of Visual RepresentationsCode1
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object DetectionCode1
Distilled Semantics for Comprehensive Scene Understanding from VideosCode1
Affect2MM: Affective Analysis of Multimedia Content Using Emotion CausalityCode1
Show:102550
← PrevPage 6 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified