Label-Efficient LiDAR Panoptic Segmentation Mar 4, 2025 Instance Segmentation Panoptic Segmentation
— Unverified 0vS-Graphs: Integrating Visual SLAM and Situational Graphs through Multi-level Scene Understanding Mar 3, 2025 Scene Understanding Simultaneous Localization and Mapping
— Unverified 0Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond Mar 3, 2025 Infrared And Visible Image Fusion Scene Understanding
— Unverified 0OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding Mar 3, 2025 Scene Understanding Semantic SLAM
— Unverified 0Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning Mar 1, 2025 Scene Understanding
Code Code Available 2Floorplan-SLAM: A Real-Time, High-Accuracy, and Long-Term Multi-Session Point-Plane SLAM for Efficient Floorplan Reconstruction Mar 1, 2025 GPU Pose Estimation
— Unverified 0Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator Feb 26, 2025 Depth Estimation Diversity
Code Code Available 4VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion Feb 25, 2025 Autonomous Driving Navigate
— Unverified 0AAD-LLM: Neural Attention-Driven Auditory Scene Understanding Feb 24, 2025 Question Answering Response Generation
— Unverified 0Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration Feb 23, 2025 3DGS 3D Semantic Segmentation
— Unverified 0Hierarchical Context Transformer for Multi-level Semantic Scene Understanding Feb 21, 2025 Contrastive Learning Representation Learning
Code Code Available 0CrossOver: 3D Scene Cross-Modal Alignment Feb 20, 2025 cross-modal alignment Object
Code Code Available 3AVD2: Accident Video Diffusion for Accident Video Description Feb 20, 2025 Autonomous Driving Scene Understanding
— Unverified 0Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning Feb 19, 2025 Autonomous Driving Bench2Drive
— Unverified 0Understanding and Evaluating Hallucinations in 3D Visual Language Models Feb 18, 2025 Diversity Scene Understanding
— Unverified 0Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review Feb 16, 2025 Scene Understanding
— Unverified 0NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM Feb 16, 2025 Navigate RAG
Code Code Available 2Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy Feb 15, 2025 Point Cloud Registration Scene Understanding
Code Code Available 1FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation Feb 13, 2025 Autonomous Driving LIDAR Semantic Segmentation
— Unverified 03D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning Feb 13, 2025 Code Generation Scene Understanding
— Unverified 0sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views Feb 6, 2025 3D Reconstruction 3D Scene Reconstruction
— Unverified 0Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation Feb 4, 2025 Contrastive Learning Decoder
— Unverified 0Event-aided Semantic Scene Completion Feb 4, 2025 Autonomous Driving Scene Understanding
Code Code Available 1AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis Feb 3, 2025 Object Counting Scene Understanding
— Unverified 0Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation Jan 30, 2025 Memorization Scene Understanding
— Unverified 0Efficient Interactive 3D Multi-Object Removal Jan 29, 2025 Object Scene Understanding
— Unverified 0Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Jan 28, 2025 object-detection Object Detection
— Unverified 0PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Jan 27, 2025 Benchmarking Common Sense Reasoning
— Unverified 0Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics Jan 26, 2025 Object Recognition Scene Understanding
— Unverified 0HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation Jan 24, 2025 Autonomous Driving Language Modeling
Code Code Available 3Scene Understanding Enabled Semantic Communication with Open Channel Coding Jan 24, 2025 Question Answering Scene Understanding
— Unverified 0GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization Jan 23, 2025 3DGS Autonomous Driving
— Unverified 0Separated Inter/Intra-Modal Fusion Prompts for Compositional Zero-Shot Learning Jan 22, 2025 Attribute Compositional Zero-Shot Learning
— Unverified 0Neural Radiance Fields for the Real World: A Survey Jan 22, 2025 Scene Understanding Survey
— Unverified 0EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery Jan 20, 2025 Language Modeling Language Modelling
Code Code Available 1Dynamic Scene Understanding from Vision-Language Representations Jan 20, 2025 Grounded Situation Recognition Human-Human Interaction Recognition
— Unverified 0A Survey of World Models for Autonomous Driving Jan 20, 2025 Anomaly Detection Autonomous Driving
Code Code Available 1A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features Jan 17, 2025 Language Modeling Language Modelling
— Unverified 0CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation Jan 16, 2025 Novel View Synthesis Scene Understanding
Code Code Available 0YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks Jan 16, 2025 AI Agent Scene Understanding
— Unverified 0Embodied Scene Understanding for Vision Language Models via MetaVQA Jan 15, 2025 Decision Making Question Answering
— Unverified 03UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding Jan 14, 2025 Language Modeling Language Modelling
Code Code Available 1Hierarchical Superpixel Segmentation via Structural Information Theory Jan 13, 2025 graph construction graph partitioning
Code Code Available 0Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models Jan 13, 2025 Scene Understanding
— Unverified 0Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving Jan 12, 2025 Autonomous Driving Decision Making
— Unverified 0Self-Supervised Partial Cycle-Consistency for Multi-View Matching Jan 10, 2025 Scene Understanding
Code Code Available 0UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation Jan 10, 2025 Decoder Graph Generation
— Unverified 0Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding Jan 9, 2025 Autonomous Driving In-Context Learning
— Unverified 0A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision Jan 9, 2025 3D Reconstruction Depth Estimation
— Unverified 0NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data Jan 8, 2025 Autonomous Driving Instance Segmentation
Code Code Available 0