SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 251300 of 1723 papers

TitleStatusHype
ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction DetectionCode1
Learning How To Robustly Estimate Camera Pose in Endoscopic VideosCode1
RS2G: Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception and Scenario UnderstandingCode1
STRAP: Structured Object Affordance Segmentation with Point SupervisionCode1
Complementary Random Masking for RGB-Thermal Semantic SegmentationCode1
DPF: Learning Dense Prediction Fields with Weak SupervisionCode1
HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph GenerationCode1
Real-Time Semantic Segmentation using Hyperspectral Images for Mapping Unstructured and Unknown EnvironmentsCode1
You Only Need One Thing One Click: Self-Training for Weakly Supervised 3D Scene UnderstandingCode1
Viewpoint Equivariance for Multi-View 3D Object DetectionCode1
Self-distillation for surgical action recognitionCode1
Constructing Metric-Semantic Maps using Floor Plan Priors for Long-Term Indoor LocalizationCode1
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object DetectionCode1
Traffic Scene Parsing through the TSP6K DatasetCode1
CEKD: Cross-Modal Edge-Privileged Knowledge Distillation for Semantic Scene Understanding Using Only Thermal ImagesCode1
Deep Learning for Event-based Vision: A Comprehensive Survey and BenchmarksCode1
3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose EstimationCode1
OvarNet: Towards Open-vocabulary Object Attribute RecognitionCode1
Unleash the Potential of Image Branch for Cross-modal 3D Object DetectionCode1
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIPCode1
Uni-3D: A Universal Model for Panoptic 3D Scene ReconstructionCode1
PeakConv: Learning Peak Receptive Field for Radar Semantic SegmentationCode1
PointVST: Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image TranslationCode1
Learning Object-level Point Augmentor for Semi-supervised 3D Object DetectionCode1
Towards Holistic Surgical Scene UnderstandingCode1
LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for Autonomous DrivingCode1
Towards Scene Understanding for Autonomous Operations on Airport ApronsCode1
Language-Assisted 3D Feature Learning for Semantic Scene UnderstandingCode1
BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object DetectionCode1
RGB-T Semantic Segmentation with Location, Activation, and SharpeningCode1
Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real DataCode1
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task modelsCode1
SQA3D: Situated Question Answering in 3D ScenesCode1
Image Masking for Robust Self-Supervised Monocular Depth EstimationCode1
FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier ConvolutionsCode1
Uncertainty-Driven Active Vision for Implicit Scene ReconstructionCode1
Dynamic Graph Message Passing Networks for Visual RecognitionCode1
Segmenting Known Objects and Unseen Unknowns without Prior KnowledgeCode1
Leveraging Large (Visual) Language Models for Robot 3D Scene UnderstandingCode1
MassMIND: Massachusetts Maritime INfrared DatasetCode1
SemSegDepth: A Combined Model for Semantic Segmentation and Depth CompletionCode1
Semantic Segmentation-Assisted Instance Feature Fusion for Multi-Level 3D Part Instance SegmentationCode1
TAG: Boosting Text-VQA via Text-aware Visual Question-answer GenerationCode1
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point CloudCode1
CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous DrivingCode1
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language ModelsCode1
Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise BinarizationCode1
Egocentric Scene Understanding via Multimodal Spatial RectifierCode1
Efficient Multi-Task RGB-D Scene Analysis for Indoor EnvironmentsCode1
MCTS with Refinement for Proposals Selection Games in Scene UnderstandingCode1
Show:102550
← PrevPage 6 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified