Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 726–750 of 1723 papers

Title	Date	Tasks	Status
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition	Nov 27, 2024	Action RecognitionGraph Attention	CodeCode Available
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents	Nov 27, 2024	Autonomous NavigationObject Recognition	CodeCode Available
Reconstructing Animals and the Wild	Nov 27, 2024	3D ReconstructionScene Understanding	—Unverified
Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning	Nov 26, 2024	Objectobject-detection	CodeCode Available
HSI-Drive v2.0: More Data for New Challenges in Scene Understanding for Autonomous Driving	Nov 26, 2024	Autonomous DrivingImage Segmentation	—Unverified
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics	Nov 25, 2024	Robot ManipulationScene Understanding	—Unverified
Open-Vocabulary Octree-Graph for 3D Scene Understanding	Nov 25, 2024	ObjectScene Understanding	—Unverified
UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations	Nov 22, 2024	Autonomous DrivingScene Understanding	—Unverified
Multimodal 3D Reasoning Segmentation with Complex Scenes	Nov 21, 2024	Reasoning SegmentationScene Understanding	—Unverified
Classification of Geographical Land Structure Using Convolution Neural Network and Transfer Learning	Nov 19, 2024	Scene UnderstandingTransfer Learning	—Unverified
Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications	Nov 18, 2024	Scene SegmentationScene Understanding	—Unverified
Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation	Nov 18, 2024	Autonomous DrivingLIDAR Semantic Segmentation	—Unverified
MGNiceNet: Unified Monocular Geometric Scene Understanding	Nov 18, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available
The ADUULM-360 Dataset -- A Multi-Modal Dataset for Depth Estimation in Adverse Weather	Nov 18, 2024	Autonomous DrivingDepth Estimation	CodeCode Available
Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry	Nov 17, 2024	Question AnsweringScene Understanding	—Unverified
Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New Paradigm	Nov 16, 2024	Autonomous VehiclesDecision Making	—Unverified
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation	Nov 16, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available
Content-Aware Preserving Image Generation	Nov 15, 2024	Image GenerationScene Understanding	—Unverified
SE(3) Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation	Nov 11, 2024	Data AugmentationDecoder	—Unverified
Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving	Nov 6, 2024	Autonomous DrivingMulti-Object Tracking	—Unverified
Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting	Nov 4, 2024	Scene UnderstandingUncertainty Quantification	—Unverified
Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images	Nov 4, 2024	Multi-Task LearningScene Understanding	CodeCode Available
UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration	Oct 30, 2024	Point Cloud RegistrationRepresentation Learning	—Unverified
Symbolic Graph Inference for Compound Scene Understanding	Oct 30, 2024	Question AnsweringScene Understanding	—Unverified
Towards Robust Algorithms for Surgical Phase Recognition via Digital Twin-based Scene Representation	Oct 26, 2024	InformativenessScene Understanding	—Unverified

Show:10 25 50

← PrevPage 30 of 69Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified