SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 251300 of 1723 papers

TitleStatusHype
CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous DrivingCode1
Cerberus Transformer: Joint Semantic, Affordance and Attribute ParsingCode1
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task modelsCode1
Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic SegmentationCode1
All-Day Multi-Camera Multi-Target TrackingCode1
IDA-3D: Instance-Depth-Aware 3D Object Detection From Stereo Vision for Autonomous DrivingCode1
Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic SegmentationCode1
IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic EnvironmentsCode1
Automatic Extrinsic Calibration Method for LiDAR and Camera Sensor SetupsCode1
Holistic 3D Scene Understanding from a Single Image with Implicit RepresentationCode1
Segmenting Known Objects and Unseen Unknowns without Prior KnowledgeCode1
AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D ScansCode1
ALFWorld: Aligning Text and Embodied Environments for Interactive LearningCode1
HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D ScansCode1
Human-centric Scene Understanding for 3D Large-scale ScenariosCode1
Image Masking for Robust Self-Supervised Monocular Depth EstimationCode1
Lane Graph Estimation for Scene Understanding in Urban DrivingCode1
Group Contextual Encoding for 3D Point CloudsCode1
Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship DetectionCode1
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D UnderstandingCode1
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source LocalizationCode1
A Two-Stage Masked Autoencoder Based Network for Indoor Depth CompletionCode1
AirObject: A Temporally Evolving Graph Embedding for Object IdentificationCode1
A Hybrid Sparse-Dense Monocular SLAM System for Autonomous DrivingCode1
Grounded Situation Recognition with TransformersCode1
GFF: Gated Fully Fusion for Semantic SegmentationCode1
Generating Visual Spatial Description via Holistic 3D Scene UnderstandingCode1
Global Aggregation then Local Distribution in Fully Convolutional NetworksCode1
General Geometry-aware Weakly Supervised 3D Object DetectionCode1
Context Prior for Scene SegmentationCode1
3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose EstimationCode1
Global-Reasoned Multi-Task Learning Model for Surgical Scene UnderstandingCode1
Complementary Random Masking for RGB-Thermal Semantic SegmentationCode1
From General to Specific: Informative Scene Graph Generation via Balance AdjustmentCode1
F-ViTA: Foundation Model Guided Visible to Thermal TranslationCode1
Collaborative Transformers for Grounded Situation RecognitionCode1
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous DrivingCode1
Few-Shot Object Detection and Viewpoint Estimation for Objects in the WildCode1
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene UnderstandingCode1
FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud SegmentationCode1
Affordance Transfer Learning for Human-Object Interaction DetectionCode1
FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier ConvolutionsCode1
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic FieldsCode1
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object DetectionCode1
HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph GenerationCode1
Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph AnalysisCode1
A Survey on Deep Learning Technique for Video SegmentationCode1
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene ContextsCode1
4D Panoptic LiDAR SegmentationCode1
Explainable Object-induced Action Decision for Autonomous VehiclesCode1
Show:102550
← PrevPage 6 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified