Scene Understanding

Scene understanding involves interpreting the visual information of a scene, including objects, their spatial relationships, and the overall layout. It goes beyond simple object recognition by considering the context and how objects relate to each other and the environment.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–500 of 1723 papers

Title	Date	Tasks	Status	Hype
General Geometry-aware Weakly Supervised 3D Object Detection	Jul 18, 2024	3D Object DetectionObject	CodeCode Available	1
Dual-Hybrid Attention Network for Specular Highlight Removal	Jul 17, 2024	highlight removalObject Recognition	CodeCode Available	1
InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction	Jul 17, 2024	Scene UnderstandingSurface Reconstruction	CodeCode Available	0
Benchmarking Vision Language Models for Cultural Understanding	Jul 15, 2024	BenchmarkingQuestion Answering	—Unverified	0
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations	Jul 15, 2024	AllImage Retrieval	CodeCode Available	1
Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data	Jul 14, 2024	3D Object Detection3D Semantic Segmentation	CodeCode Available	0
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding	Jul 13, 2024	Scene UnderstandingZero-Shot Learning	—Unverified	0
BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight	Jul 11, 2024	Autonomous DrivingBEV Segmentation	—Unverified	0
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences	Jul 10, 2024	Multi-Task LearningScene Understanding	—Unverified	0
Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search	Jul 10, 2024	Few-Shot LearningGPU	CodeCode Available	0
LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition	Jul 9, 2024	Instruction FollowingRepresentation Learning	—Unverified	0
Joint prototype and coefficient prediction for 3D instance segmentation	Jul 9, 2024	3D Instance SegmentationInstance Segmentation	—Unverified	0
Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness	Jul 7, 2024	Activity RecognitionScene Understanding	—Unverified	0
Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding	Jul 5, 2024	Scene Understanding	—Unverified	0
A Unified Framework for 3D Scene Understanding	Jul 3, 2024	Contrastive LearningKnowledge Distillation	CodeCode Available	2
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders	Jul 2, 2024	Boundary DetectionHuman Parsing	CodeCode Available	1
Uni-DVPS: Unified Model for Depth-Aware Video Panoptic Segmentation	Jul 1, 2024	Autonomous DrivingDecoder	CodeCode Available	1
PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction	Jul 1, 2024	3D Panoptic SegmentationInstance Segmentation	—Unverified	0
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes	Jul 1, 2024	Autonomous VehiclesImage Segmentation	CodeCode Available	1
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding	Jun 30, 2024	Graph GenerationGraph Neural Network	—Unverified	0
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting	Jun 28, 2024	Human-Object Interaction DetectionObject	—Unverified	0
PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation	Jun 28, 2024	DecoderImage Segmentation	—Unverified	0
3D-MVP: 3D Multiview Pretraining for Robotic Manipulation	Jun 26, 2024	DecoderRobot Manipulation	—Unverified	0
GPT-4V Explorations: Mining Autonomous Driving	Jun 24, 2024	Autonomous DrivingDecision Making	—Unverified	0
AudioBench: A Universal Benchmark for Audio Large Language Models	Jun 23, 2024	Audio Scene UnderstandingInstruction Following	CodeCode Available	3
EvSegSNN: Neuromorphic Semantic Segmentation for Event Data	Jun 20, 2024	Autonomous VehiclesDecoder	—Unverified	0
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images	Jun 19, 2024	Object RecognitionScene Understanding	CodeCode Available	2
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features	Jun 17, 2024	3D geometry3D Semantic Occupancy Prediction	—Unverified	0
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding	Jun 17, 2024	3D Object Detection3D Semantic Segmentation	—Unverified	0
MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report	Jun 14, 2024	Autonomous DrivingScene Understanding	—Unverified	0
A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion	Jun 14, 2024	3D ReconstructionAutonomous Driving	CodeCode Available	1
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding	Jun 13, 2024	Multiple-choiceScene Understanding	CodeCode Available	1
Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment	Jun 12, 2024	3D ReconstructionScene Understanding	CodeCode Available	0
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent	Jun 11, 2024	AI AgentDescriptive	CodeCode Available	2
FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping	Jun 4, 2024	3DGSScene Understanding	—Unverified	0
EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding	Jun 3, 2024	Domain AdaptationOpen Vocabulary Semantic Segmentation	—Unverified	0
Object Aware Egocentric Online Action Detection	Jun 3, 2024	Action DetectionObject	—Unverified	0
CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos	Jun 3, 2024	Graph GenerationScene Graph Generation	—Unverified	0
Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024	Jun 2, 2024	Scene ParsingScene Understanding	—Unverified	0
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation	May 30, 2024	Instruction Followingparameter-efficient fine-tuning	—Unverified	0
Learning 3D Robotics Perception using Inductive Priors	May 30, 2024	3D ReconstructionImage Generation	—Unverified	0
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding	May 29, 2024	Scene UnderstandingSegmentation	—Unverified	0
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane	May 27, 2024	3DGSfeature selection	—Unverified	0
Open-Vocabulary SAM3D: Towards Training-free Open-Vocabulary 3D Scene Understanding	May 24, 2024	Scene UnderstandingZero Shot Segmentation	—Unverified	0
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis	May 23, 2024	Novel View SynthesisScene Understanding	—Unverified	0
Transformers for Image-Goal Navigation	May 23, 2024	NavigateScene Understanding	—Unverified	0
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments	May 23, 2024	Pose EstimationScene Understanding	CodeCode Available	1
TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System	May 22, 2024	3D Object Detection3D Semantic Segmentation	—Unverified	0
GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games	May 22, 2024	Code GenerationDecision Making	—Unverified	0
Anticipating Object State Changes in Long Procedural Videos	May 21, 2024	ObjectObject State Change Classification	—Unverified	0

Show:10 25 50

← PrevPage 10 of 35Next →

All datasets Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)ADE20K val Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.44	—	Unverified
2	Team VGAI (TCS Research)	OMQ	0.37	—	Unverified
3	Demo_semantic_SLAM	OMQ	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CPN(ResNet-101)	Mean IoU	46.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACRV Baseline	OMQ	0.35	—	Unverified