Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–600 of 1723 papers

Title	Date	Tasks	Status	Hype
Semantic Is Enough: Only Semantic Information For NeRF Reconstruction	Mar 24, 2024	NeRFobject-detection	—Unverified	0
AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans	Mar 24, 2024	3D Instance SegmentationInstance Segmentation	CodeCode Available	1
Multi-Task Learning with Multi-Task Optimization	Mar 24, 2024	Automated Theorem Provingimage-classification	—Unverified	0
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting	Mar 22, 2024	Instance SegmentationObject Localization	—Unverified	0
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data	Mar 22, 2024	DenoisingScene Understanding	—Unverified	0
Exosense: A Vision-Based Scene Understanding System For Exoskeletons	Mar 21, 2024	Language ModellingMotion Planning	—Unverified	0
SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field	Mar 21, 2024	3D Scene ReconstructionAutonomous Driving	—Unverified	0
3D Object Detection from Point Cloud via Voting Step Diffusion	Mar 21, 2024	3D Object DetectionObject	CodeCode Available	0
Volumetric Environment Representation for Vision-Language Navigation	Mar 21, 2024	3D geometryMulti-Task Learning	CodeCode Available	2
What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models	Mar 20, 2024	counterfactualHallucination	CodeCode Available	1
Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation	Mar 19, 2024	Domain AdaptationObject	CodeCode Available	0
Geometric Constraints in Deep Learning Frameworks: A Survey	Mar 19, 2024	Deep LearningDepth Estimation	—Unverified	0
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting	Mar 19, 2024	Novel View SynthesisScene Understanding	—Unverified	0
M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving	Mar 19, 2024	Autonomous DrivingAutonomous Vehicles	—Unverified	0
R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding	Mar 18, 2024	ObjectRelation Prediction	—Unverified	0
OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation	Mar 18, 2024	3D Reconstruction3D Scene Reconstruction	CodeCode Available	0
Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation	Mar 18, 2024	Common Sense ReasoningEfficient Exploration	CodeCode Available	0
Agent3D-Zero: An Agent for Zero-shot 3D Understanding	Mar 18, 2024	Language ModellingScene Understanding	—Unverified	0
Urban Scene Diffusion through Semantic Occupancy Map	Mar 18, 2024	Image GenerationScene Understanding	—Unverified	0
Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields	Mar 17, 2024	3D ReconstructionNeRF	CodeCode Available	0
N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields	Mar 16, 2024	Scene Understanding	—Unverified	0
Segment Any Object Model (SAOM): Real-to-Simulation Fine-Tuning Strategy for Multi-Class Multi-Instance Segmentation	Mar 16, 2024	Instance SegmentationObject	—Unverified	0
Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning	Mar 15, 2024	Autonomous DrivingHuman-Object Interaction Detection	—Unverified	0
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding	Mar 14, 2024	Contrastive LearningRepresentation Learning	CodeCode Available	1
MoAI: Mixture of All Intelligence for Large Language and Vision Models	Mar 12, 2024	AllMixture-of-Experts	CodeCode Available	3
Mapping High-level Semantic Regions in Indoor Environments without Object Recognition	Mar 11, 2024	Graph GenerationLanguage Modeling	—Unverified	0
Optimizing Latent Graph Representations of Surgical Scenes for Zero-Shot Domain Transfer	Mar 11, 2024	AnatomyDisentanglement	CodeCode Available	1
Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation	Mar 8, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available	1
Embodied Understanding of Driving Scenarios	Mar 7, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available	3
Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation for Complex Scenes	Mar 7, 2024	Motion SegmentationOptical Flow Estimation	—Unverified	0
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding	Mar 6, 2024	NeRFScene Understanding	—Unverified	0
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes	Mar 5, 2024	Scene Understanding	—Unverified	0
FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything	Feb 29, 2024	3D Object ReconstructionInstance Segmentation	CodeCode Available	2
WHU-Synthetic: A Synthetic Perception Dataset for 3-D Multitask Model Research	Feb 29, 2024	3D ReconstructionAttribute	CodeCode Available	1
One model to use them all: Training a segmentation model with complementary datasets	Feb 29, 2024	AllAnatomy	CodeCode Available	0
PCDepth: Pattern-based Complementary Learning for Monocular Depth Estimation by Best of Both Worlds	Feb 29, 2024	Depth EstimationDepth Prediction	—Unverified	0
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment	Feb 27, 2024	Scene Understanding	—Unverified	0
AVS-Net: Point Sampling with Adaptive Voxel Size for 3D Scene Understanding	Feb 27, 2024	3D Object Detection3D Part Segmentation	CodeCode Available	0
OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding	Feb 23, 2024	Scene Understanding	—Unverified	0
Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding	Feb 22, 2024	DiversityScene Understanding	CodeCode Available	3
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models	Feb 19, 2024	Autonomous DrivingScene Understanding	—Unverified	0
Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review	Feb 17, 2024	Panoptic SegmentationScene Segmentation	CodeCode Available	1
Moving Object Proposals with Deep Learned Optical Flow for Video Object Segmentation	Feb 14, 2024	DecoderObject	—Unverified	0
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models	Feb 12, 2024	HallucinationObject Localization	CodeCode Available	4
InCoRo: In-Context Learning for Robotics Control with Feedback Loops	Feb 7, 2024	In-Context LearningScene Understanding	—Unverified	0
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives	Feb 5, 2024	Continual LearningMulti-Task Learning	CodeCode Available	2
SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM	Feb 5, 2024	3D Semantic SegmentationCamera Pose Estimation	CodeCode Available	3
Neural Language of Thought Models	Feb 2, 2024	Image GenerationObject	—Unverified	0
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available	0
Non-central panorama indoor dataset	Jan 30, 2024	Scene Understanding	CodeCode Available	0

Show:10 25 50

← PrevPage 12 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified