Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation Feb 4, 2025 Contrastive Learning Decoder
— Unverified 0AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis Feb 3, 2025 Object Counting Scene Understanding
— Unverified 0Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation Jan 30, 2025 Memorization Scene Understanding
— Unverified 0Efficient Interactive 3D Multi-Object Removal Jan 29, 2025 Object Scene Understanding
— Unverified 0Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Jan 28, 2025 object-detection Object Detection
— Unverified 0PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Jan 27, 2025 Benchmarking Common Sense Reasoning
— Unverified 0Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics Jan 26, 2025 Object Recognition Scene Understanding
— Unverified 0Scene Understanding Enabled Semantic Communication with Open Channel Coding Jan 24, 2025 Question Answering Scene Understanding
— Unverified 0GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization Jan 23, 2025 3DGS Autonomous Driving
— Unverified 0Neural Radiance Fields for the Real World: A Survey Jan 22, 2025 Scene Understanding Survey
— Unverified 0Separated Inter/Intra-Modal Fusion Prompts for Compositional Zero-Shot Learning Jan 22, 2025 Attribute Compositional Zero-Shot Learning
— Unverified 0Dynamic Scene Understanding from Vision-Language Representations Jan 20, 2025 Grounded Situation Recognition Human-Human Interaction Recognition
— Unverified 0A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features Jan 17, 2025 Language Modeling Language Modelling
— Unverified 0YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks Jan 16, 2025 AI Agent Scene Understanding
— Unverified 0CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation Jan 16, 2025 Novel View Synthesis Scene Understanding
Code Code Available 0Embodied Scene Understanding for Vision Language Models via MetaVQA Jan 15, 2025 Decision Making Question Answering
— Unverified 0Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models Jan 13, 2025 Scene Understanding
— Unverified 0Hierarchical Superpixel Segmentation via Structural Information Theory Jan 13, 2025 graph construction graph partitioning
Code Code Available 0Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving Jan 12, 2025 Autonomous Driving Decision Making
— Unverified 0UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation Jan 10, 2025 Decoder Graph Generation
— Unverified 0Self-Supervised Partial Cycle-Consistency for Multi-View Matching Jan 10, 2025 Scene Understanding
Code Code Available 0Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding Jan 9, 2025 Autonomous Driving In-Context Learning
— Unverified 0A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision Jan 9, 2025 3D Reconstruction Depth Estimation
— Unverified 0TADFormer : Task-Adaptive Dynamic Transformer for Efficient Multi-Task Learning Jan 8, 2025 Multi-Task Learning parameter-efficient fine-tuning
— Unverified 0NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data Jan 8, 2025 Autonomous Driving Instance Segmentation
Code Code Available 0Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets Jan 7, 2025 Data Augmentation parameter estimation
— Unverified 0LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving Jan 7, 2025 Autonomous Driving Contrastive Learning
— Unverified 0CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds Jan 7, 2025 Contrastive Learning Language Modeling
— Unverified 0IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks Jan 3, 2025 Data Integration Image Segmentation
Code Code Available 03D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer Jan 2, 2025 Scene Understanding
— Unverified 0Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction Jan 2, 2025 Instance Segmentation Scene Understanding
— Unverified 0Beyond Human Perception: Understanding Multi-Object World from Monocular View Jan 1, 2025 3D visual grounding Denoising
Code Code Available 0Vision-Language Embodiment for Monocular Depth Estimation Jan 1, 2025 3D Reconstruction Depth Estimation
— Unverified 0Object-aware Sound Source Localization via Audio-Visual Scene Understanding Jan 1, 2025 Scene Understanding Sound Source Localization
Code Code Available 0AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning Jan 1, 2025 Audio-visual Question Answering Continual Learning
Code Code Available 0HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics Jan 1, 2025 Depth Estimation Room Layout Estimation
— Unverified 0Scene Map-based Prompt Tuning for Navigation Instruction Generation Jan 1, 2025 Scene Understanding
— Unverified 0GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model Jan 1, 2025 Attribute Language Modeling
— Unverified 0TADFormer: Task-Adaptive Dynamic TransFormer for Efficient Multi-Task Learning Jan 1, 2025 Multi-Task Learning parameter-efficient fine-tuning
— Unverified 03D-MVP: 3D Multiview Pretraining for Manipulation Jan 1, 2025 Decoder Robot Manipulation
— Unverified 0Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding Dec 31, 2024 Robot Manipulation Scene Understanding
— Unverified 0OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies Dec 31, 2024 3DGS 3D Semantic Segmentation
Code Code Available 0Text-to-Image GAN with Pretrained Representations Dec 30, 2024 Domain Generalization Image Generation
— Unverified 04D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives Dec 30, 2024 Novel View Synthesis Scene Understanding
— Unverified 0MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios Dec 27, 2024 Autonomous Driving Language Modeling
Code Code Available 0UniPLV: Towards Label-Efficient Open-World 3D Scene Understanding by Regional Visual Language Supervision Dec 24, 2024 Scene Understanding Semantic Segmentation
— Unverified 0Parallel Neural Computing for Scene Understanding from LiDAR Perception in Autonomous Racing Dec 24, 2024 Autonomous Driving Autonomous Racing
Code Code Available 0LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding Dec 23, 2024 3D Semantic Segmentation Scene Understanding
— Unverified 0Application of Multimodal Large Language Models in Autonomous Driving Dec 21, 2024 Autonomous Driving Decision Making
— Unverified 0Improving Object Detection for Time-Lapse Imagery Using Temporal Features in Wildlife Monitoring Dec 20, 2024 Object object-detection
Code Code Available 0