Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–750 of 1723 papers

Title	Date	Tasks	Status
ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects	Dec 19, 2024	Scene Understanding	—Unverified
GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting	Dec 18, 2024	Scene UnderstandingSemantic Segmentation	—Unverified
Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset	Dec 18, 2024	Pedestrian DetectionScene Understanding	—Unverified
An Enhanced Classification Method Based on Adaptive Multi-Scale Fusion for Long-tailed Multispectral Point Clouds	Dec 16, 2024	ClassificationScene Understanding	—Unverified
SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians	Dec 13, 2024	GPUObject Localization	—Unverified
SLGaussian: Fast Language Gaussian Splatting in Sparse Views	Dec 11, 2024	3DGSAutonomous Navigation	—Unverified
MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents	Dec 11, 2024	object-detectionObject Detection	—Unverified
TGOSPA Metric Parameters Selection and Evaluation for Visual Multi-object Tracking	Dec 11, 2024	Multi-Object TrackingObject Tracking	—Unverified
Event fields: Capturing light fields at high speed, resolution, and dynamic range	Dec 9, 2024	Depth EstimationScene Understanding	—Unverified
Visual Lexicon: Rich Image Features in Language Space	Dec 9, 2024	Image GenerationImage Reconstruction	—Unverified
TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances	Dec 7, 2024	Multi-Task LearningObject	—Unverified
Designing DNNs for a trade-off between robustness and processing performance in embedded devices	Dec 4, 2024	Autonomous DrivingQuantization	—Unverified
Assessing the performance of CT image denoisers using Laguerre-Gauss Channelized Hotelling Observer for lesion detection	Dec 4, 2024	Deep LearningDenoising	—Unverified
BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding	Dec 3, 2024	Motion EstimationObject	—Unverified
SparseLGS: Sparse View Language Embedded Gaussian Splatting	Dec 3, 2024	Scene Understanding	—Unverified
Occam's LGS: A Simple Approach for Language Gaussian Splatting	Dec 2, 2024	3DGS3D Reconstruction	—Unverified
Holistic Understanding of 3D Scenes as Universal Scene Description	Dec 2, 2024	Instance SegmentationMixed Reality	—Unverified
A Semantic Communication System for Real-time 3D Reconstruction Tasks	Dec 2, 2024	3D ReconstructionScene Understanding	—Unverified
ChatSplat: 3D Conversational Gaussian Splatting	Dec 1, 2024	Large Language ModelScene Understanding	—Unverified
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding	Nov 30, 2024	3D Question Answering (3D-QA)Position	CodeCode Available
SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation	Nov 29, 2024	Motion PlanningRAG	—Unverified
Quantifying the synthetic and real domain gap in aerial scene understanding	Nov 29, 2024	Domain AdaptationScene Understanding	—Unverified
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments	Nov 28, 2024	Adversarial TextScene Understanding	—Unverified
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception	Nov 28, 2024	3DGSAutonomous Driving	—Unverified
On-chip Hyperspectral Image Segmentation with Fully Convolutional Networks for Scene Understanding in Autonomous Driving	Nov 28, 2024	Autonomous DrivingHyperspectral Image Segmentation	—Unverified
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition	Nov 27, 2024	Action RecognitionGraph Attention	CodeCode Available
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents	Nov 27, 2024	Autonomous NavigationObject Recognition	CodeCode Available
Reconstructing Animals and the Wild	Nov 27, 2024	3D ReconstructionScene Understanding	—Unverified
Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning	Nov 26, 2024	Objectobject-detection	CodeCode Available
HSI-Drive v2.0: More Data for New Challenges in Scene Understanding for Autonomous Driving	Nov 26, 2024	Autonomous DrivingImage Segmentation	—Unverified
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics	Nov 25, 2024	Robot ManipulationScene Understanding	—Unverified
Open-Vocabulary Octree-Graph for 3D Scene Understanding	Nov 25, 2024	ObjectScene Understanding	—Unverified
UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations	Nov 22, 2024	Autonomous DrivingScene Understanding	—Unverified
Multimodal 3D Reasoning Segmentation with Complex Scenes	Nov 21, 2024	Reasoning SegmentationScene Understanding	—Unverified
Classification of Geographical Land Structure Using Convolution Neural Network and Transfer Learning	Nov 19, 2024	Scene UnderstandingTransfer Learning	—Unverified
Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications	Nov 18, 2024	Scene SegmentationScene Understanding	—Unverified
Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation	Nov 18, 2024	Autonomous DrivingLIDAR Semantic Segmentation	—Unverified
MGNiceNet: Unified Monocular Geometric Scene Understanding	Nov 18, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available
The ADUULM-360 Dataset -- A Multi-Modal Dataset for Depth Estimation in Adverse Weather	Nov 18, 2024	Autonomous DrivingDepth Estimation	CodeCode Available
Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry	Nov 17, 2024	Question AnsweringScene Understanding	—Unverified
Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New Paradigm	Nov 16, 2024	Autonomous VehiclesDecision Making	—Unverified
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation	Nov 16, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available
Content-Aware Preserving Image Generation	Nov 15, 2024	Image GenerationScene Understanding	—Unverified
SE(3) Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation	Nov 11, 2024	Data AugmentationDecoder	—Unverified
Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving	Nov 6, 2024	Autonomous DrivingMulti-Object Tracking	—Unverified
Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting	Nov 4, 2024	Scene UnderstandingUncertainty Quantification	—Unverified
Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images	Nov 4, 2024	Multi-Task LearningScene Understanding	CodeCode Available
UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration	Oct 30, 2024	Point Cloud RegistrationRepresentation Learning	—Unverified
Symbolic Graph Inference for Compound Scene Understanding	Oct 30, 2024	Question AnsweringScene Understanding	—Unverified
Towards Robust Algorithms for Surgical Phase Recognition via Digital Twin-based Scene Representation	Oct 26, 2024	InformativenessScene Understanding	—Unverified

Show:10 25 50

← PrevPage 15 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified