SOTAVerified

Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Showing 16511700 of 1723 papers

TitleStatusHype
SDOF-Tracker: Fast and Accurate Multiple Human Tracking by Skipped-Detection and Optical-FlowCode0
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic SegmentationCode0
CLAIR-A: Leveraging Large Language Models to Judge Audio CaptionsCode0
Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field EstimationCode0
Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor EnvironmentCode0
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual ScenariosCode0
Aerial Scene Understanding in The Wild: Multi-Scene Recognition via Prototype-based Memory NetworksCode0
Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene UnderstandingCode0
Evaluating Compositional Scene Understanding in Multimodal Generative ModelsCode0
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media ReasoningCode0
ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic SegmentationCode0
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention UnderstandingCode0
SeGAN: Segmenting and Generating the InvisibleCode0
Artificial Color Constancy via GoogLeNet with Angular Loss FunctionCode0
Adaptive Visual Scene Understanding: Incremental Scene Graph GenerationCode0
Temporally Consistent Horizon LinesCode0
CARL-D: A vision benchmark suite and large scale dataset for vehicle detection and scene segmentationCode0
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning AbstractionsCode0
Efficient ConvNet for Real-time Semantic SegmentationCode0
Bridging Stereo Matching and Optical Flow via Spatiotemporal CorrespondenceCode0
Segmenting the FutureCode0
Learning Regional Purity for Instance Segmentation on 3D Point CloudsCode0
SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language ModelCode0
Learning Panoptic Segmentation from Instance ContoursCode0
Box for Mask and Mask for Box: weak losses for multi-task partially supervised learningCode0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
Efficient Computation Sharing for Multi-Task Visual Scene UnderstandingCode0
DualMLP: a two-stream fusion model for 3D point cloud classificationCode0
Road Scene Understanding by Occupancy Grid Learning from Sparse Radar Clusters using Semantic SegmentationCode0
Self-Supervised Partial Cycle-Consistency for Multi-View MatchingCode0
Learning Monocular Depth by Distilling Cross-domain Stereo NetworksCode0
Boundary-Seeking Generative Adversarial NetworksCode0
Dual-Glance Model for Deciphering Social RelationshipsCode0
Self-Supervised Road Layout Parsing with Graph Auto-EncodingCode0
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry PriorsCode0
Self-supervised Vision Transformers for 3D Pose Estimation of Novel ObjectsCode0
Zoom in on the Plant: Fine-grained Analysis of Leaf, Stem and Vein InstancesCode0
Language-based Colorization of Scene SketchesCode0
Label-Attention Transformer with Geometrically Coherent Objects for Image CaptioningCode0
Adversarial Attacks on Monocular Pose EstimationCode0
Visually Grounded VQA by Lattice-based RetrievalCode0
The ADUULM-360 Dataset -- A Multi-Modal Dataset for Depth Estimation in Adverse WeatherCode0
DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object DetectionCode0
Doubly Contrastive End-to-End Semantic Segmentation for Autonomous Driving under Adverse WeatherCode0
A Review on Deep Learning Techniques Applied to Semantic SegmentationCode0
Semantic Foreground Inpainting from Weak SupervisionCode0
BOLD5000: A public fMRI dataset of 5000 imagesCode0
DOCTR: Disentangled Object-Centric Transformer for Point Scene UnderstandingCode0
UniNet: A Unified Scene Understanding Network and Exploring Multi-Task Relationships through the Lens of Adversarial AttacksCode0
Knowledge-Guided Object Discovery with Acquired Deep ImpressionsCode0
Show:102550
← PrevPage 34 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.44Unverified
2Team VGAI (TCS Research)OMQ0.37Unverified
3Demo_semantic_SLAMOMQ0.11Unverified
#ModelMetricClaimedVerifiedStatus
1CPN(ResNet-101)Mean IoU46.3Unverified
#ModelMetricClaimedVerifiedStatus
1ACRV BaselineOMQ0.35Unverified