Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval Jun 28, 2025 Cross-Modal Retrieval Image Captioning
— Unverified 0VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding Jun 28, 2025 3DGS Instance Segmentation
— Unverified 0RAG-6DPose: Retrieval-Augmented 6D Pose Estimation via Leveraging CAD as Knowledge Base Jun 23, 2025 6D Pose Estimation Object Localization
— Unverified 0CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion Jun 17, 2025 Object Localization
— Unverified 0UAV Object Detection and Positioning in a Mining Industrial Metaverse with Custom Geo-Referenced Data Jun 16, 2025 3D Reconstruction object-detection
— Unverified 0WoMAP: World Models For Embodied Open-Vocabulary Object Localization Jun 2, 2025 Active Object Localization Efficient Exploration
— Unverified 0Multispectral Detection Transformer with Infrared-Centric Sensor Fusion May 21, 2025 Multispectral Object Detection Object
Code Code Available 0Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels May 20, 2025 Instruction Following Knowledge Distillation
— Unverified 0Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method May 20, 2025 Hallucination Object Localization
— Unverified 0PointArena: Probing Multimodal Grounding Through Language-Guided Pointing May 15, 2025 Object Localization
— Unverified 0Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving May 13, 2025 3D visual grounding Autonomous Driving
Code Code Available 1Towards Accurate State Estimation: Kalman Filter Incorporating Motion Dynamics for 3D Multi-Object Tracking May 12, 2025 3D Multi-Object Tracking Multi-Object Tracking
— Unverified 0Enhancing Satellite Object Localization with Dilated Convolutions and Attention-aided Spatial Pooling May 8, 2025 feature selection Object
Code Code Available 0Split Matching for Inductive Zero-shot Semantic Segmentation May 8, 2025 Object Localization Semantic Segmentation
— Unverified 0Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization May 8, 2025 Object Localization Weakly-Supervised Object Localization
— Unverified 0Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation Apr 19, 2025 3D Semantic Segmentation image-classification
— Unverified 0Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D Apr 19, 2025 Decoder Object Localization
Code Code Available 3CFIS-YOLO: A Lightweight Multi-Scale Fusion Network for Edge-Deployable Wood Defect Detection Apr 15, 2025 Computational Efficiency Defect Detection
— Unverified 0SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding Apr 14, 2025 Camera Calibration Object Localization
Code Code Available 1Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization Apr 14, 2025 Benchmarking Earth Observation
— Unverified 0Multi-Object Grounding via Hierarchical Contrastive Siamese Transformers Apr 14, 2025 Object Object Localization
— Unverified 0POEM: Precise Object-level Editing via MLLM control Apr 10, 2025 Image Generation Object
— Unverified 0MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing Mar 31, 2025 Object object-detection
Code Code Available 0Texture or Semantics? Vision-Language Models Get Lost in Font Recognition Mar 31, 2025 Few-Shot Learning Font Recognition
Code Code Available 0PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization Mar 31, 2025 image-classification Image Classification
Code Code Available 0GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection Mar 26, 2025 Common Sense Reasoning Object
— Unverified 0Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding Mar 25, 2025 Attribute Object
— Unverified 0xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion Mar 19, 2025 Multi-object discovery Object
Code Code Available 0Omnidirectional Multi-Object Tracking Mar 6, 2025 Multi-Object Tracking Object
Code Code Available 2Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration Feb 23, 2025 3DGS 3D Semantic Segmentation
— Unverified 0CrossOver: 3D Scene Cross-Modal Alignment Feb 20, 2025 cross-modal alignment Object
Code Code Available 3Qwen2.5-VL Technical Report Feb 19, 2025 document understanding
Code Code Available 11MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval Feb 18, 2025 Action Recognition Moment Retrieval
— Unverified 0Auto-Prompting SAM for Weakly Supervised Landslide Extraction Jan 23, 2025 Landslide segmentation Object Localization
— Unverified 0TeD-Loc: Text Distillation for Weakly Supervised Object Localization Jan 22, 2025 Classification Denoising
Code Code Available 0Neuromorphic Optical Tracking and Imaging of Randomly Moving Targets through Strongly Scattering Media Jan 7, 2025 Computational Efficiency Image Reconstruction
— Unverified 0AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features Jan 7, 2025 3D Object Detection Computational Efficiency
— Unverified 0Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion Jan 1, 2025 Multi-object discovery Object
— Unverified 0SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization Dec 21, 2024 Image Captioning Multimodal Reasoning
Code Code Available 0Demystifying the Potential of ChatGPT-4 Vision for Construction Progress Monitoring Dec 20, 2024 Object Localization
— Unverified 0SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians Dec 13, 2024 GPU Object Localization
— Unverified 0Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation Dec 9, 2024 Object Localization Vision and Language Navigation
Code Code Available 13D Spatial Understanding in MLLMs: Disambiguation and Evaluation Dec 9, 2024 3D dense captioning 3D visual grounding
— Unverified 0RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts Dec 7, 2024 Change Detection Image Comprehension
Code Code Available 1SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding Dec 5, 2024 3D visual grounding Object Localization
— Unverified 0GraPix: Exploring Graph Modularity Optimization for Unsupervised Pixel Clustering Dec 4, 2024 Attribute Clustering
Code Code Available 0RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations Dec 2, 2024 Object Localization
— Unverified 0SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection Nov 29, 2024 3D Multi-Object Tracking 3D Object Detection
Code Code Available 0ObjectRelator: Enabling Cross-View Object Relation Understanding in Ego-Centric and Exo-Centric Videos Nov 28, 2024 Object Object Localization
— Unverified 0GloFinder: AI-empowered QuPath Plugin for WSI-level Glomerular Detection, Visualization, and Curation Nov 27, 2024 Object Localization whole slide images
— Unverified 0