SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 201250 of 1723 papers

TitleStatusHype
Few-Shot Object Detection and Viewpoint Estimation for Objects in the WildCode1
IDA-3D: Instance-Depth-Aware 3D Object Detection From Stereo Vision for Autonomous DrivingCode1
Image Segmentation Using Deep Learning: A SurveyCode1
Improving Visual Recognition with Hyperbolical Visual Hierarchy MappingCode1
BoMuDANet: Unsupervised Adaptation for Visual Scene Understanding in Unstructured Driving EnvironmentsCode1
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth EstimationCode1
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene UnderstandingCode1
Bootstraping Clustering of Gaussians for View-consistent 3D Scene UnderstandingCode1
AVSegFormer: Audio-Visual Segmentation with TransformerCode1
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous DrivingCode1
A Data-Centric Revisit of Pre-Trained Vision Models for Robot LearningCode1
Boundary-induced and scene-aggregated network for monocular depth predictionCode1
KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3DCode1
Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic SegmentationCode1
OK-VQA: A Visual Question Answering Benchmark Requiring External KnowledgeCode1
Language Embedded 3D Gaussians for Open-Vocabulary Scene UnderstandingCode1
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and ReasoningCode1
Learning and Reasoning with the Graph Structure Representation in Robotic SurgeryCode1
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene ContextsCode1
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation ModelsCode1
Learning Object-Centric Representations of Multi-Object Scenes from Multiple ViewsCode1
Learning Object-level Point Augmentor for Semi-supervised 3D Object DetectionCode1
LED: Light Enhanced Depth Estimation at NightCode1
Leveraging Large (Visual) Language Models for Robot 3D Scene UnderstandingCode1
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous DrivingCode1
Light Field Networks: Neural Scene Representations with Single-Evaluation RenderingCode1
CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D NetworksCode1
Explainable Object-induced Action Decision for Autonomous VehiclesCode1
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household RoboticsCode1
CamContextI2V: Context-aware Controllable Video GenerationCode1
Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph AnalysisCode1
A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed ImagesCode1
FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud SegmentationCode1
Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor SceneCode1
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene UnderstandingCode1
MassMIND: Massachusetts Maritime INfrared DatasetCode1
MGNet: Monocular Geometric Scene Understanding for Autonomous DrivingCode1
Microsoft COCO: Common Objects in ContextCode1
Advances in Deep Concealed Scene UnderstandingCode1
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic SurgeryCode1
All-Day Multi-Camera Multi-Target TrackingCode1
Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce ModelCode1
Estimating Generic 3D Room Structures from 2D AnnotationsCode1
Monte Carlo Scene Search for 3D Scene UnderstandingCode1
3DMIT: 3D Multi-modal Instruction Tuning for Scene UnderstandingCode1
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based DecodersCode1
Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic SegmentationCode1
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal EstimationCode1
Event-aided Semantic Scene CompletionCode1
Automatic Extrinsic Calibration Method for LiDAR and Camera Sensor SetupsCode1
Show:102550
← PrevPage 5 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified