Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks May 27, 2025 3D Scene Reconstruction Diagnostic
— Unverified 0Compositional Scene Understanding through Inverse Generative Modeling May 27, 2025 Scene Understanding
— Unverified 0OccLE: Label-Efficient 3D Semantic Occupancy Prediction May 27, 2025 3D Semantic Occupancy Prediction Autonomous Driving
— Unverified 0A Graph Completion Method that Jointly Predicts Geometry and Topology Enables Effective Molecule Assembly May 27, 2025 Denoising Drug Design
— Unverified 0OmniIndoor3D: Comprehensive Indoor 3D Reconstruction May 27, 2025 3DGS 3D Reconstruction
— Unverified 0Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement May 26, 2025 Image Enhancement object-detection
— Unverified 0Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection May 25, 2025 cross-modal alignment Scene Understanding
— Unverified 0FHGS: Feature-Homogenized Gaussian Splatting May 25, 2025 3DGS Scene Understanding
— Unverified 0Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps May 24, 2025 Scene Understanding Spatial Reasoning
— Unverified 0Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding May 24, 2025 Domain Generalization Representation Learning
— Unverified 0From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation May 23, 2025 3DGS 3D Reconstruction
— Unverified 0Assessing the generalization performance of SAM for ureteroscopy scene understanding May 22, 2025 Scene Understanding Segmentation
— Unverified 0CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation May 22, 2025 Scene Understanding Spatial Reasoning
Code Code Available 1DC-Scene: Data-Centric Learning for 3D Scene Understanding May 21, 2025 Autonomous Driving Scene Understanding
Code Code Available 0HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning May 21, 2025 Autonomous Driving Mamba
— Unverified 0RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation May 21, 2025 GPU Natural Language Queries
— Unverified 0Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets May 21, 2025 Dataset Generation Descriptive
— Unverified 0AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning May 19, 2025 Multimodal Reasoning Scene Understanding
— Unverified 0Predicting Reaction Time to Comprehend Scenes with Foveated Scene Understanding Maps May 19, 2025 Scene Understanding
— Unverified 0SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving May 18, 2025 Autonomous Driving Autonomous Vehicles
— Unverified 0LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding May 18, 2025 Scene Understanding
— Unverified 0Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind May 18, 2025 Benchmarking Scene Understanding
— Unverified 0TinyRS-R1: Compact Multimodal Language Model for Remote Sensing May 17, 2025 Language Modeling Language Modelling
— Unverified 0APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds May 15, 2025 Point Cloud Segmentation Scene Understanding
Code Code Available 0StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation May 15, 2025 Face Recognition Object
Code Code Available 1DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection May 14, 2025 object-detection Object Detection
Code Code Available 0Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning May 14, 2025 Relation Extraction Scene Understanding
— Unverified 0Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving May 13, 2025 3D visual grounding Autonomous Driving
Code Code Available 1Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods,Datasets,and Future Directions May 12, 2025 Accident Anticipation Prediction
— Unverified 0Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation May 11, 2025 Autonomous Driving Domain Adaptation
— Unverified 0Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding May 11, 2025 2D Semantic Segmentation Denoising
— Unverified 0Camera Control at the Edge with Language Models for Scene Understanding May 9, 2025 Language Modeling Language Modelling
— Unverified 0Camera-Only Bird's Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles May 9, 2025 Autonomous Navigation Autonomous Vehicles
— Unverified 0Does CLIP perceive art the same way we do? May 8, 2025 Image Generation Scene Understanding
— Unverified 0Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization May 8, 2025 Scene Understanding Sound Source Localization
Code Code Available 1PADriver: Towards Personalized Autonomous Driving May 8, 2025 Autonomous Driving Language Modeling
— Unverified 0RAFT: Robust Augmentation of FeaTures for Image Segmentation May 7, 2025 Active Learning Domain Adaptation
— Unverified 0Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation May 4, 2025 Benchmarking Feature Upsampling
Code Code Available 0Segment Any RGB-Thermal Model with Language-aided Distillation May 4, 2025 Instance Segmentation Knowledge Distillation
— Unverified 0Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models May 3, 2025 Diagnostic Object Recognition
— Unverified 0Embracing Diffraction: A Paradigm Shift in Wireless Sensing and Communication May 2, 2025 Scene Understanding
— Unverified 0V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving Apr 30, 2025 Autonomous Driving Decision Making
— Unverified 0LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics Apr 30, 2025 In-Context Learning Object
Code Code Available 1Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding Apr 28, 2025 3D Semantic Segmentation Contrastive Learning
— Unverified 0Category-Level and Open-Set Object Pose Estimation for Robotics Apr 28, 2025 6D Pose Estimation 6D Pose Estimation using RGB
— Unverified 0TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance Apr 23, 2025 Question Answering Scene Understanding
— Unverified 0Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends Apr 21, 2025 Adversarial Robustness Decision Making
— Unverified 0Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding Apr 20, 2025 Autonomous Driving Image Captioning
Code Code Available 0Vision-Centric Representation-Efficient Fine-Tuning for Robust Universal Foreground Segmentation Apr 20, 2025 Attribute Foreground Segmentation
— Unverified 0Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding Apr 18, 2025 Deep Learning Point Cloud Completion
Code Code Available 0