SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis Jun 12, 2025 Novel View Synthesis Scene Understanding
— Unverified 0SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields Jun 11, 2025 3D Reconstruction Scene Understanding
— Unverified 0SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting Jun 10, 2025 3DGS Scene Understanding
— Unverified 0PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly Jun 10, 2025 Question Answering Scene Understanding
— Unverified 0Robust Visual Localization via Semantic-Guided Multi-Scale Transformer Jun 10, 2025 regression Scene Understanding
— Unverified 0Design and Evaluation of Deep Learning-Based Dual-Spectrum Image Fusion Methods Jun 9, 2025 Fairness Scene Understanding
— Unverified 0OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting Jun 9, 2025 3DGS 3D Instance Segmentation
— Unverified 0SpatialLM: Training Large Language Models for Structured Indoor Modeling Jun 9, 2025 3D Object Detection Language Modeling
— Unverified 0Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs Jun 5, 2025 cross-modal alignment Dense Captioning
— Unverified 0ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation Jun 5, 2025 3D Reconstruction NeRF
— Unverified 0Tactile MNIST: Benchmarking Active Tactile Perception Jun 3, 2025 Benchmarking Scene Understanding
— Unverified 0Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation Jun 3, 2025 Caption Generation Image Captioning
— Unverified 0SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes Jun 2, 2025 Scene Understanding
— Unverified 0Learning Sparsity for Effective and Efficient Music Performance Question Answering Jun 2, 2025 Audio-visual Question Answering Question Answering
— Unverified 0Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors May 30, 2025 3D geometry Large Language Model
Code Code Available 0SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model May 29, 2025 Image Super-Resolution Language Modeling
Code Code Available 0DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation May 28, 2025 Autonomous Navigation RAG
— Unverified 0LiDAR Based Semantic Perception for Forklifts in Outdoor Environments May 28, 2025 Scene Understanding Segmentation
— Unverified 0OccLE: Label-Efficient 3D Semantic Occupancy Prediction May 27, 2025 3D Semantic Occupancy Prediction Autonomous Driving
— Unverified 0A Graph Completion Method that Jointly Predicts Geometry and Topology Enables Effective Molecule Assembly May 27, 2025 Denoising Drug Design
— Unverified 0Compositional Scene Understanding through Inverse Generative Modeling May 27, 2025 Scene Understanding
— Unverified 0Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks May 27, 2025 3D Scene Reconstruction Diagnostic
— Unverified 0OmniIndoor3D: Comprehensive Indoor 3D Reconstruction May 27, 2025 3DGS 3D Reconstruction
— Unverified 0Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement May 26, 2025 Image Enhancement object-detection
— Unverified 0Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection May 25, 2025 cross-modal alignment Scene Understanding
— Unverified 0FHGS: Feature-Homogenized Gaussian Splatting May 25, 2025 3DGS Scene Understanding
— Unverified 0Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps May 24, 2025 Scene Understanding Spatial Reasoning
— Unverified 0Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding May 24, 2025 Domain Generalization Representation Learning
— Unverified 0From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation May 23, 2025 3DGS 3D Reconstruction
— Unverified 0Assessing the generalization performance of SAM for ureteroscopy scene understanding May 22, 2025 Scene Understanding Segmentation
— Unverified 0HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning May 21, 2025 Autonomous Driving Mamba
— Unverified 0Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets May 21, 2025 Dataset Generation Descriptive
— Unverified 0RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation May 21, 2025 GPU Natural Language Queries
— Unverified 0DC-Scene: Data-Centric Learning for 3D Scene Understanding May 21, 2025 Autonomous Driving Scene Understanding
Code Code Available 0AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning May 19, 2025 Multimodal Reasoning Scene Understanding
— Unverified 0Predicting Reaction Time to Comprehend Scenes with Foveated Scene Understanding Maps May 19, 2025 Scene Understanding
— Unverified 0Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind May 18, 2025 Benchmarking Scene Understanding
— Unverified 0SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving May 18, 2025 Autonomous Driving Autonomous Vehicles
— Unverified 0LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding May 18, 2025 Scene Understanding
— Unverified 0TinyRS-R1: Compact Multimodal Language Model for Remote Sensing May 17, 2025 Language Modeling Language Modelling
— Unverified 0APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds May 15, 2025 Point Cloud Segmentation Scene Understanding
Code Code Available 0DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection May 14, 2025 object-detection Object Detection
Code Code Available 0Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning May 14, 2025 Relation Extraction Scene Understanding
— Unverified 0Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods,Datasets,and Future Directions May 12, 2025 Accident Anticipation Prediction
— Unverified 0Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding May 11, 2025 2D Semantic Segmentation Denoising
— Unverified 0Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation May 11, 2025 Autonomous Driving Domain Adaptation
— Unverified 0Camera-Only Bird's Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles May 9, 2025 Autonomous Navigation Autonomous Vehicles
— Unverified 0Camera Control at the Edge with Language Models for Scene Understanding May 9, 2025 Language Modeling Language Modelling
— Unverified 0Does CLIP perceive art the same way we do? May 8, 2025 Image Generation Scene Understanding
— Unverified 0PADriver: Towards Personalized Autonomous Driving May 8, 2025 Autonomous Driving Language Modeling
— Unverified 0