Qwen2.5-VL Technical Report Feb 19, 2025 document understanding
Code Code Available 11Bilateral Reference for High-Resolution Dichotomous Image Segmentation Jan 7, 2024 Camouflaged Object Segmentation Dichotomous Image Segmentation
Code Code Available 7VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks Jun 12, 2024 Image Generation Language Modeling
Code Code Available 5Mamba-FETrack: Frame-Event Tracking via State Space Model Apr 28, 2024 GPU Mamba
Code Code Available 4The All-Seeing Project V2: Towards General Relation Comprehension of the Open World Feb 29, 2024 All Hallucination
Code Code Available 4Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models Feb 12, 2024 Hallucination Object Localization
Code Code Available 4Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D Apr 19, 2025 Decoder Object Localization
Code Code Available 3CrossOver: 3D Scene Cross-Modal Alignment Feb 20, 2025 cross-modal alignment Object
Code Code Available 3DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation Nov 7, 2024 Object Localization
Code Code Available 3LangSplat: 3D Language Gaussian Splatting Dec 26, 2023 NeRF Object Localization
Code Code Available 3Omnidirectional Multi-Object Tracking Mar 6, 2025 Multi-Object Tracking Object
Code Code Available 2A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation Sep 27, 2024 Exemplar-Free Counting Few-shot Object Counting and Detection
Code Code Available 2Many-Shot In-Context Learning in Multimodal Foundation Models May 16, 2024 image-classification Image Classification
Code Code Available 2Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object Detection Jan 19, 2024 Multispectral Object Detection Object
Code Code Available 2Point Segment and Count: A Generalized Framework for Object Counting Jan 1, 2024 Few-shot Object Counting and Detection Knowledge Distillation
Code Code Available 2CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection Oct 4, 2023 3D Object Detection cross-modal alignment
Code Code Available 2Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark Nov 24, 2022 2D Object Detection Image Retrieval
Code Code Available 2Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation Mar 25, 2022 Contrastive Learning image-classification
Code Code Available 2Crafting Better Contrastive Views for Siamese Representation Learning Feb 7, 2022 Contrastive Learning Object Localization
Code Code Available 2C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation Jan 1, 2022 Contrastive Learning image-classification
Code Code Available 2Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs Jan 18, 2021 3D Reconstruction Object Localization
Code Code Available 2BOP Challenge 2020 on 6D Object Localization Sep 15, 2020 6D Pose Estimation 6D Pose Estimation using RGB
Code Code Available 2Deep Snake for Real-Time Instance Segmentation Jan 6, 2020 GPU Instance Segmentation
Code Code Available 2Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving May 13, 2025 3D visual grounding Autonomous Driving
Code Code Available 1SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding Apr 14, 2025 Camera Calibration Object Localization
Code Code Available 1Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation Dec 9, 2024 Object Localization Vision and Language Navigation
Code Code Available 1RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts Dec 7, 2024 Change Detection Image Comprehension
Code Code Available 1OCDet: Object Center Detection via Bounding Box-Aware Heatmap Prediction on Edge Devices with NPUs Nov 23, 2024 Keypoint Detection Object
Code Code Available 1Upsampling DINOv2 features for unsupervised vision tasks and weakly supervised materials segmentation Oct 20, 2024 Clustering graph partitioning
Code Code Available 1PuzzleBoard: A New Camera Calibration Pattern with Position Encoding Sep 30, 2024 Camera Calibration Camera Pose Estimation
Code Code Available 1MambaEVT: Event Stream based Visual Object Tracking using State Space Model Aug 20, 2024 Mamba Object Localization
Code Code Available 1Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection Jul 12, 2024 Collaborative Inference Language Modelling
Code Code Available 1Deep Learning Innovations for Underwater Waste Detection: An In-Depth Analysis May 28, 2024 Object Localization
Code Code Available 1FlightScope: An Experimental Comparative Review of Aircraft Detection Algorithms in Satellite Imagery Apr 3, 2024 Object object-detection
Code Code Available 1IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models Mar 23, 2024 Common Sense Reasoning In-Context Learning
Code Code Available 1Few-shot Object Localization Mar 19, 2024 Model Optimization Object
Code Code Available 1CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective Mar 11, 2024 Data Augmentation Object Localization
Code Code Available 1Spatial Structure Constraints for Weakly Supervised Semantic Segmentation Jan 20, 2024 Object Object Localization
Code Code Available 1Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation Dec 21, 2023 Edge Detection Feature Engineering
Code Code Available 1Object-Aware Domain Generalization for Object Detection Dec 19, 2023 Autonomous Driving Contrastive Learning
Code Code Available 1Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance Dec 17, 2023 3D Instance Segmentation 3D Open-Vocabulary Instance Segmentation
Code Code Available 1Exploring Foveation and Saccade for Improved Weakly-Supervised Localization Dec 16, 2023 Active Object Localization Foveation
Code Code Available 1Mono3DVG: 3D Visual Grounding in Monocular Images Dec 13, 2023 3D Object Detection 3D visual grounding
Code Code Available 1Boosting Segment Anything Model Towards Open-Vocabulary Learning Dec 6, 2023 model Object
Code Code Available 1BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection Dec 4, 2023 3D Object Detection Decoder
Code Code Available 1Grounding Everything: Emerging Localization Properties in Vision-Language Transformers Dec 1, 2023 Image Retrieval Object Localization
Code Code Available 1Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding Nov 30, 2023 GPU Inductive Bias
Code Code Available 1Point, Segment and Count: A Generalized Framework for Object Counting Nov 21, 2023 Knowledge Distillation Object
Code Code Available 1Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey Oct 19, 2023 Object Object Localization
Code Code Available 1Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs Sep 27, 2023 Form Navigate
Code Code Available 1