VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Apr 10, 2025 Language Modeling Language Modelling
Code Code Available 95 YOLO-World: Real-Time Open-Vocabulary Object Detection Jan 30, 2024 Instance Segmentation Language Modeling
Code Code Available 95 Visual-RFT: Visual Reinforcement Fine-Tuning Mar 3, 2025 Few-Shot Object Detection Fine-Grained Image Classification
Code Code Available 75 Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head Mar 11, 2024 Object Detection Open-vocabulary object detection
Code Code Available 55 FG-CLIP: Fine-Grained Visual and Textual Alignment May 8, 2025 Image-text Retrieval object-detection
Code Code Available 45 Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Mar 9, 2025 Domain Generalization Object Detection
Code Code Available 45 GLIPv2: Unifying Localization and Vision-Language Understanding Jun 12, 2022 2D Object Detection Contrastive Learning
Code Code Available 45 OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Sep 10, 2022 Continual Learning Object
Code Code Available 35 Detecting Twenty-thousand Classes using Image-level Supervision Jan 7, 2022 Cross-Domain Few-Shot Object Detection image-classification
Code Code Available 35 Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community Aug 17, 2024 Novel Concepts Object
Code Code Available 35 OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer Jul 15, 2024 Language Modeling Language Modelling
Code Code Available 35 OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation Sep 1, 2023 3D Open-Vocabulary Instance Segmentation 3D Open-Vocabulary Object Detection
Code Code Available 25 Is CLIP the main roadblock for fine-grained open-world perception? Apr 4, 2024 Autonomous Driving Novel Concepts
Code Code Available 25 Generative Region-Language Pretraining for Open-Ended Object Detection Mar 15, 2024 Language Modeling Language Modelling
Code Code Available 25 CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction Oct 2, 2023 image-classification Image Classification
Code Code Available 25 Detect Everything with Few Examples Sep 22, 2023 Binary Classification Cross-Domain Few-Shot Object Detection
Code Code Available 25 LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction Jul 16, 2024 Language Modeling Language Modelling
Code Code Available 25 SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection May 16, 2024 object-detection Object Detection
Code Code Available 25 YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection Feb 14, 2024 Fracture detection medical image detection
Code Code Available 25 PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning Nov 21, 2022 3D Classification 3D Object Detection
Code Code Available 25 Open-Vocabulary DETR with Conditional Matching Mar 22, 2022 Language Modelling object-detection
Code Code Available 25 Open Vocabulary Monocular 3D Object Detection Nov 25, 2024 3D Object Detection Monocular 3D Object Detection
Code Code Available 25 Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection Jul 7, 2022 Object Open Vocabulary Attribute Detection
Code Code Available 25 Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Sep 13, 2024 Mamba Open Vocabulary Object Detection
Code Code Available 25 Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector Feb 5, 2024 Cross-Domain Few-Shot Cross-Domain Few-Shot Object Detection
Code Code Available 25 Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection Mar 10, 2023 Object Open-vocabulary object detection
Code Code Available 15 Meta-Adapter: An Online Few-shot Learner for Vision-Language Model Nov 7, 2023 Few-Shot Learning image-classification
Code Code Available 15 DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection Oct 2, 2023 Novel Object Detection Object
Code Code Available 15 CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching Mar 23, 2023 Described Object Detection object-detection
Code Code Available 15 Multi-Modal Classifiers for Open-Vocabulary Object Detection Jun 8, 2023 Language Modelling Large Language Model
Code Code Available 15 MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection Sep 26, 2023 Instance Segmentation Mixture-of-Experts
Code Code Available 15 Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection Dec 23, 2024 object-detection Object Detection
Code Code Available 15 Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection Jan 1, 2023 Knowledge Distillation Language Modeling
Code Code Available 15 A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection Mar 13, 2025 object-detection Object Detection
Code Code Available 15 GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection Dec 22, 2023 Attribute object-detection
Code Code Available 15 MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection Jul 31, 2024 Language Modelling Object
Code Code Available 15 Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation Mar 20, 2022 Knowledge Distillation Language Modelling
Code Code Available 15 OvarNet: Towards Open-vocabulary Object Attribute Recognition Jan 23, 2023 Attribute Knowledge Distillation
Code Code Available 15 Localized Vision-Language Matching for Open-vocabulary Object Detection May 12, 2022 Language Modeling Language Modelling
Code Code Available 15 From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects Nov 27, 2024 Autonomous Driving Object
Code Code Available 15 CLIM: Contrastive Language-Image Mosaic for Region Representation Dec 18, 2023 Object object-detection
Code Code Available 15 Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian Aug 7, 2024 Autonomous Driving object-detection
Code Code Available 15 Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model Mar 28, 2022 image-classification Image Classification
Code Code Available 15 Described Object Detection: Liberating Object Detection with Flexible Expressions Jul 24, 2023 Binary Classification Described Object Detection
Code Code Available 15 LP-OVOD: Open-Vocabulary Object Detection by Linear Probing Oct 26, 2023 Object object-detection
Code Code Available 15 Open-vocabulary Attribute Detection Nov 23, 2022 Attribute Language Modeling
Code Code Available 15 Open-Vocabulary Object Detection Using Captions Nov 20, 2020 Object object-detection
Code Code Available 15 Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning Nov 20, 2023 Object object-detection
Code Code Available 15 CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection Oct 25, 2023 Object object-detection
Code Code Available 15 DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training Jul 12, 2024 Image Generation Object
Code Code Available 15