Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding Apr 18, 2025 Deep Learning Point Cloud Completion
Code Code Available 0HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering Apr 18, 2025 Clustering Graph Clustering
— Unverified 0Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs Apr 17, 2025 3D geometry 3DGS
Code Code Available 1Explainable Scene Understanding with Qualitative Representations and Graph Neural Networks Apr 17, 2025 Autonomous Driving Scene Understanding
— Unverified 0DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency Apr 16, 2025 Few-Shot Learning Interactive Segmentation
Code Code Available 1CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting Apr 16, 2025 3DGS 3D Instance Segmentation
— Unverified 0Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning Apr 15, 2025 Multi-Task Learning Scene Understanding
— Unverified 0Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization Apr 14, 2025 Benchmarking Earth Observation
— Unverified 0SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding Apr 14, 2025 Camera Calibration Object Localization
Code Code Available 1FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment Apr 11, 2025 3D geometry Natural Language Queries
— Unverified 0FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents Apr 11, 2025 3DGS Navigate
— Unverified 0DSM: Building A Diverse Semantic Map for 3D Visual Grounding Apr 11, 2025 3D visual grounding Scene Understanding
— Unverified 0DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction Apr 10, 2025 GPU Prediction
— Unverified 0Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding Apr 9, 2025 Scene Understanding Self-Supervised Learning
Code Code Available 1MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking Apr 9, 2025 Autonomous Driving Language Modeling
Code Code Available 0RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration Apr 9, 2025 3D Semantic Segmentation Benchmarking
— Unverified 0Attributes-aware Visual Emotion Representation Learning Apr 9, 2025 Attribute Emotion Recognition
— Unverified 0Audio-visual Event Localization on Portrait Mode Short Videos Apr 9, 2025 audio-visual event localization Scene Understanding
— Unverified 0PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario Apr 8, 2025 3D Object Detection Autonomous Driving
— Unverified 0CamContextI2V: Context-aware Controllable Video Generation Apr 8, 2025 Diversity Scene Understanding
Code Code Available 1RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model Apr 7, 2025 Image Captioning image-classification
— Unverified 0DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation Apr 7, 2025 3D geometry RGBD Semantic Segmentation
Code Code Available 3Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models Apr 6, 2025 Computational Efficiency General Knowledge
Code Code Available 0Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision Apr 3, 2025 3D Object Detection cross-modal alignment
Code Code Available 1F-ViTA: Foundation Model Guided Visible to Thermal Translation Apr 3, 2025 Scene Understanding Style Transfer
Code Code Available 1Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation Apr 2, 2025 3D Semantic Segmentation Adversarial Attack
— Unverified 0CoMatcher: Multi-View Collaborative Feature Matching Apr 2, 2025 Scene Understanding set matching
— Unverified 0TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication Apr 2, 2025 Language Modeling Language Modelling
— Unverified 0Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness Apr 2, 2025 Scene Understanding
— Unverified 0Scene-Centric Unsupervised Panoptic Segmentation Apr 2, 2025 Instance Segmentation Panoptic Segmentation
Code Code Available 2WikiVideo: Article Generation from Multiple Videos Apr 1, 2025 Articles RAG
Code Code Available 1Zero-Shot 4D Lidar Panoptic Segmentation Apr 1, 2025 Diversity Panoptic Segmentation
— Unverified 0Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights Apr 1, 2025 Activity Prediction Domain Generalization
— Unverified 0PhysPose: Refining 6D Object Poses with Physical Constraints Mar 30, 2025 6D Pose Estimation using RGB Pose Estimation
— Unverified 0Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model Mar 30, 2025 Depth Estimation Monocular Depth Estimation
Code Code Available 1OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model Mar 30, 2025 Autonomous Driving Decision Making
Code Code Available 4Empowering Large Language Models with 3D Situation Awareness Mar 29, 2025 Scene Understanding
— Unverified 0Evaluating Compositional Scene Understanding in Multimodal Generative Models Mar 29, 2025 Scene Understanding
Code Code Available 0Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery Mar 29, 2025 Action Understanding Instrument Recognition
— Unverified 0Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments Mar 29, 2025 Navigate Open Vocabulary Semantic Segmentation
— Unverified 0Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction Mar 28, 2025 Autonomous Driving Scene Understanding
Code Code Available 1Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users Mar 28, 2025 Object Recognition Reading Comprehension
— Unverified 0A Dataset for Semantic Segmentation in the Presence of Unknowns Mar 28, 2025 Anomaly Detection Anomaly Segmentation
— Unverified 0Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision Mar 28, 2025 Optical Flow Estimation Point Tracking
— Unverified 0Next-Best-Trajectory Planning of Robot Manipulators for Effective Observation and Exploration Mar 28, 2025 Computational Efficiency Object Reconstruction
— Unverified 0NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving Mar 28, 2025 3D visual grounding Autonomous Driving
— Unverified 0Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting Mar 27, 2025 counterfactual Object
— Unverified 0Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving Mar 27, 2025 3D Semantic Segmentation Autonomous Driving
Code Code Available 2DINeMo: Learning Neural Mesh Models with no 3D Annotations Mar 26, 2025 3D Pose Estimation 6D Pose Estimation
— Unverified 0COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting Mar 25, 2025 3DGS Object
Code Code Available 2