SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 651700 of 1723 papers

TitleStatusHype
Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene UnderstandingCode0
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and ClassificationCode0
Computational Imaging for Machine Perception: Transferring Semantic Segmentation beyond AberrationsCode0
General-Purpose Deep Point Cloud Feature ExtractorCode0
Attend, Infer, Repeat: Fast Scene Understanding with Generative ModelsCode0
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object LocalizationCode0
Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many SynthesisCode0
Monocular 3D Object Detection with Pseudo-LiDAR Point CloudCode0
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep ThinkingCode0
Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth EstimationCode0
Model-based inexact graph matching on top of CNNs for semantic scene understandingCode0
Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object RepresentationsCode0
Gated Driver Attention PredictorCode0
Gated2Depth: Real-time Dense Lidar from Gated ImagesCode0
MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and ModalitiesCode0
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic ScenariosCode0
GaIA: Graphical Information Gain based Attention Network for Weakly Supervised Point Cloud Semantic SegmentationCode0
MGNiceNet: Unified Monocular Geometric Scene UnderstandingCode0
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object ExchangeCode0
METEOR Guided Divergence for Video CaptioningCode0
Cognitive Visual Commonsense Reasoning Using Dynamic Working MemoryCode0
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth EstimationCode0
On the Structures of Representation for the Robustness of Semantic Segmentation to Input CorruptionCode0
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the WildCode0
m2caiSeg: Semantic Segmentation of Laparoscopic Images using Convolutional Neural NetworksCode0
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural ImagesCode0
From Node to Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot DetectionCode0
Loss Distillation via Gradient Matching for Point Cloud Completion with Weighted Chamfer DistanceCode0
From Feature Importance to Natural Language Explanations Using LLMs with RAGCode0
CNN-based Lidar Point Cloud De-Noising in Adverse WeatherCode0
Loss Switching Fusion with Similarity Search for Video ClassificationCode0
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action RecognitionCode0
LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual SemanticsCode0
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene UnderstandingCode0
Lightweight integration of 3D features to improve 2D image segmentationCode0
Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from VideoCode0
Leveraging Acoustic Images for Effective Self-Supervised Audio Representation LearningCode0
Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene UnderstandingCode0
FlowGrad: Using Motion for Visual Sound Source LocalizationCode0
Flow-based GAN for 3D Point Cloud Generation from a Single ImageCode0
Aerial Scene Understanding in The Wild: Multi-Scene Recognition via Prototype-based Memory NetworksCode0
Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field EstimationCode0
Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient Scene Graph GenerationCode0
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention UnderstandingCode0
Matterport3D: Learning from RGB-D Data in Indoor EnvironmentsCode0
Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and MICCAI FetReg2021 Challenge FindingsCode0
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry PriorsCode0
Learning Monocular Depth by Distilling Cross-domain Stereo NetworksCode0
Learning Panoptic Segmentation from Instance ContoursCode0
Language-based Colorization of Scene SketchesCode0
Show:102550
← PrevPage 14 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified