SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 15511600 of 1723 papers

TitleStatusHype
Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance VotingCode0
Multi-task Planar Reconstruction with Feature Warping GuidanceCode0
Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° ImagesCode0
Holistic 3D Scene Parsing and Reconstruction from a Single RGB ImageCode0
Multi-Resolution Multi-Modal Sensor Fusion For Remote Sensing Data With Label UncertaintyCode0
ShelfNet for Fast Semantic SegmentationCode0
Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth EstimationCode0
Deep Depth from Defocus: how can defocus blur improve 3D estimation using dense neural networks?Code0
BACS: Background Aware Continual Semantic SegmentationCode0
RESSCAL3D++: Joint Acquisition and Semantic Segmentation of 3D Point CloudsCode0
ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed dataCode0
Hierarchical Superpixel Segmentation via Structural Information TheoryCode0
Hierarchical Spatial Proximity Reasoning for Vision-and-Language NavigationCode0
Veritatem Dies Aperit- Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding ApproachCode0
Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding ApproachCode0
Hierarchical Context Transformer for Multi-level Semantic Scene UnderstandingCode0
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agentsCode0
Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic SurgeryCode0
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and ClassificationCode0
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep ThinkingCode0
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object LocalizationCode0
Monocular 3D Object Detection with Pseudo-LiDAR Point CloudCode0
DC-Scene: Data-Centric Learning for 3D Scene UnderstandingCode0
RIO: 3D Object Instance Re-Localization in Changing Indoor EnvironmentsCode0
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene UnderstandingCode0
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation dataCode0
Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object RepresentationsCode0
General-Purpose Deep Point Cloud Feature ExtractorCode0
Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many SynthesisCode0
APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point CloudsCode0
Gated Driver Attention PredictorCode0
A Critical Assessment of Visual Sound Source Localization Models Including Negative AudioCode0
Model-based inexact graph matching on top of CNNs for semantic scene understandingCode0
Gated2Depth: Real-time Dense Lidar from Gated ImagesCode0
GaIA: Graphical Information Gain based Attention Network for Weakly Supervised Point Cloud Semantic SegmentationCode0
MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and ModalitiesCode0
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the WildCode0
Rotation Invariant Convolutions for 3D Point Clouds Deep LearningCode0
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic ScenariosCode0
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object ExchangeCode0
DA-RNN: Semantic Mapping with Data Associated Recurrent Neural NetworksCode0
MGNiceNet: Unified Monocular Geometric Scene UnderstandingCode0
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth EstimationCode0
Collaborative Propagation on Multiple Instance Graphs for 3D Instance Segmentation with Single-point SupervisionCode0
Improving Social Awareness Through DANTE: A Deep Affinity Network for Clustering Conversational InteractantsCode0
DADA: Driver Attention Prediction in Driving Accident ScenariosCode0
Structure-Aware Residual Pyramid Network for Monocular Depth EstimationCode0
METEOR Guided Divergence for Video CaptioningCode0
MC-PanDA: Mask Confidence for Panoptic Domain AdaptationCode0
Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene GraphsCode0
Show:102550
← PrevPage 32 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified