SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 451475 of 1723 papers

TitleStatusHype
DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric VoxelizationCode1
ODAM: Object Detection, Association, and Mapping using Posed RGB VideoCode1
Object Pose Estimation via the Aggregation of Diffusion FeaturesCode1
Occlusion-Aware Depth Estimation with Adaptive Normal ConstraintsCode1
PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB ImageCode1
PointGroup: Dual-Set Point Grouping for 3D Instance SegmentationCode1
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal EstimationCode1
Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene UnderstandingCode1
Egocentric Scene Understanding via Multimodal Spatial RectifierCode1
One-Shot Object Affordance Detection in the WildCode1
CamContextI2V: Context-aware Controllable Video GenerationCode1
Online 3D reconstruction and dense tracking in endoscopic videosCode1
Human-centric Scene Understanding for 3D Large-scale ScenariosCode1
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense KnowledgeCode1
Estimating Generic 3D Room Structures from 2D AnnotationsCode1
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D EnvironmentsCode1
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D DataCode1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene UnderstandingCode1
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth EstimationCode1
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene UnderstandingCode1
EndoChat: Grounded Multimodal Large Language Model for Endoscopic SurgeryCode1
PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View SynthesisCode1
Point Scene Understanding via Disentangled Instance Mesh ReconstructionCode1
OvarNet: Towards Open-vocabulary Object Attribute RecognitionCode1
ROOT: VLM based System for Indoor Scene Understanding and BeyondCode1
Show:102550
← PrevPage 19 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified