RAFT: Robust Augmentation of FeaTures for Image Segmentation May 7, 2025 Active Learning Domain Adaptation
— Unverified 0Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation May 4, 2025 Benchmarking Feature Upsampling
Code Code Available 0Segment Any RGB-Thermal Model with Language-aided Distillation May 4, 2025 Instance Segmentation Knowledge Distillation
— Unverified 0Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models May 3, 2025 Diagnostic Object Recognition
— Unverified 0Embracing Diffraction: A Paradigm Shift in Wireless Sensing and Communication May 2, 2025 Scene Understanding
— Unverified 0V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving Apr 30, 2025 Autonomous Driving Decision Making
— Unverified 0Category-Level and Open-Set Object Pose Estimation for Robotics Apr 28, 2025 6D Pose Estimation 6D Pose Estimation using RGB
— Unverified 0Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding Apr 28, 2025 3D Semantic Segmentation Contrastive Learning
— Unverified 0TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance Apr 23, 2025 Question Answering Scene Understanding
— Unverified 0Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends Apr 21, 2025 Adversarial Robustness Decision Making
— Unverified 0Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding Apr 20, 2025 Autonomous Driving Image Captioning
Code Code Available 0Vision-Centric Representation-Efficient Fine-Tuning for Robust Universal Foreground Segmentation Apr 20, 2025 Attribute Foreground Segmentation
— Unverified 0Temporal Propagation of Asymmetric Feature Pyramid for Surgical Scene Segmentation Apr 18, 2025 Scene Segmentation Scene Understanding
— Unverified 0Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding Apr 18, 2025 Deep Learning Point Cloud Completion
Code Code Available 0HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering Apr 18, 2025 Clustering Graph Clustering
— Unverified 0Explainable Scene Understanding with Qualitative Representations and Graph Neural Networks Apr 17, 2025 Autonomous Driving Scene Understanding
— Unverified 0CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting Apr 16, 2025 3DGS 3D Instance Segmentation
— Unverified 0Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning Apr 15, 2025 Multi-Task Learning Scene Understanding
— Unverified 0Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization Apr 14, 2025 Benchmarking Earth Observation
— Unverified 0DSM: Building A Diverse Semantic Map for 3D Visual Grounding Apr 11, 2025 3D visual grounding Scene Understanding
— Unverified 0FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents Apr 11, 2025 3DGS Navigate
— Unverified 0FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment Apr 11, 2025 3D geometry Natural Language Queries
— Unverified 0DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction Apr 10, 2025 GPU Prediction
— Unverified 0MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking Apr 9, 2025 Autonomous Driving Language Modeling
Code Code Available 0RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration Apr 9, 2025 3D Semantic Segmentation Benchmarking
— Unverified 0Audio-visual Event Localization on Portrait Mode Short Videos Apr 9, 2025 audio-visual event localization Scene Understanding
— Unverified 0Attributes-aware Visual Emotion Representation Learning Apr 9, 2025 Attribute Emotion Recognition
— Unverified 0PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario Apr 8, 2025 3D Object Detection Autonomous Driving
— Unverified 0RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model Apr 7, 2025 Image Captioning image-classification
— Unverified 0Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models Apr 6, 2025 Computational Efficiency General Knowledge
Code Code Available 0Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation Apr 2, 2025 3D Semantic Segmentation Adversarial Attack
— Unverified 0TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication Apr 2, 2025 Language Modeling Language Modelling
— Unverified 0CoMatcher: Multi-View Collaborative Feature Matching Apr 2, 2025 Scene Understanding set matching
— Unverified 0Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness Apr 2, 2025 Scene Understanding
— Unverified 0Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights Apr 1, 2025 Activity Prediction Domain Generalization
— Unverified 0Zero-Shot 4D Lidar Panoptic Segmentation Apr 1, 2025 Diversity Panoptic Segmentation
— Unverified 0PhysPose: Refining 6D Object Poses with Physical Constraints Mar 30, 2025 6D Pose Estimation using RGB Pose Estimation
— Unverified 0Evaluating Compositional Scene Understanding in Multimodal Generative Models Mar 29, 2025 Scene Understanding
Code Code Available 0Empowering Large Language Models with 3D Situation Awareness Mar 29, 2025 Scene Understanding
— Unverified 0Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery Mar 29, 2025 Action Understanding Instrument Recognition
— Unverified 0Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments Mar 29, 2025 Navigate Open Vocabulary Semantic Segmentation
— Unverified 0NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving Mar 28, 2025 3D visual grounding Autonomous Driving
— Unverified 0Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users Mar 28, 2025 Object Recognition Reading Comprehension
— Unverified 0Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision Mar 28, 2025 Optical Flow Estimation Point Tracking
— Unverified 0Next-Best-Trajectory Planning of Robot Manipulators for Effective Observation and Exploration Mar 28, 2025 Computational Efficiency Object Reconstruction
— Unverified 0A Dataset for Semantic Segmentation in the Presence of Unknowns Mar 28, 2025 Anomaly Detection Anomaly Segmentation
— Unverified 0Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting Mar 27, 2025 counterfactual Object
— Unverified 0DINeMo: Learning Neural Mesh Models with no 3D Annotations Mar 26, 2025 3D Pose Estimation 6D Pose Estimation
— Unverified 0OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations Mar 25, 2025 3D Semantic Segmentation Scene Understanding
— Unverified 0Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving Mar 24, 2025 Autonomous Driving Knowledge Graphs
— Unverified 0