| YOLO-World: Real-Time Open-Vocabulary Object Detection | Jan 30, 2024 | Instance SegmentationLanguage Modeling | CodeCode Available | 9 |
| T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | Mar 21, 2024 | Contrastive LearningDescriptive | CodeCode Available | 7 |
| Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection | May 16, 2024 | Edge-computingFew-Shot Object Detection | CodeCode Available | 7 |
| DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Nov 21, 2024 | Long-tailed Object DetectionObject | CodeCode Available | 5 |
| Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection | Mar 9, 2023 | DecoderObject Detection | CodeCode Available | 5 |
| Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head | Mar 11, 2024 | Object DetectionOpen-vocabulary object detection | CodeCode Available | 5 |
| ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models | Apr 19, 2022 | FairnessFew-Shot Image Classification | CodeCode Available | 4 |
| VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning | May 17, 2025 | 2D Object DetectionObject Counting | CodeCode Available | 4 |
| OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion | Jul 10, 2024 | Object DetectionZero-Shot Object Detection | CodeCode Available | 3 |
| Grounded Language-Image Pre-training | Dec 7, 2021 | 2D Object DetectionDescribed Object Detection | CodeCode Available | 2 |
| Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection | Jul 7, 2022 | ObjectOpen Vocabulary Attribute Detection | CodeCode Available | 2 |
| Multi-modal Queried Object Detection in the Wild | May 30, 2023 | Few-Shot Object DetectionObject | CodeCode Available | 2 |
| ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data | Aug 22, 2023 | Attributeobject-detection | CodeCode Available | 1 |
| DoUnseen: Tuning-Free Class-Adaptive Object Detection of Unseen Objects for Robotic Grasping | Apr 6, 2023 | Objectobject-detection | CodeCode Available | 1 |
| Synthesizing the Unseen for Zero-shot Object Detection | Oct 19, 2020 | DiversityGeneralized Zero-Shot Object Detection | CodeCode Available | 1 |
| Resolving Semantic Confusions for Improved Zero-Shot Detection | Dec 12, 2022 | Generalized Zero-Shot Object DetectionObject Detection | CodeCode Available | 1 |
| LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset | Mar 4, 2025 | Classificationobject-detection | CodeCode Available | 1 |
| Robust Region Feature Synthesizer for Zero-Shot Object Detection | Jan 1, 2022 | Generalized Zero-Shot Object DetectionObject | CodeCode Available | 1 |
| OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision | May 28, 2024 | Contrastive LearningDenoising | CodeCode Available | 1 |
| Learning Open-World Object Proposals without Learning to Classify | Aug 15, 2021 | Objectobject-detection | CodeCode Available | 1 |
| Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture | Nov 22, 2021 | Handwritten Text Recognitionobject-detection | CodeCode Available | 1 |
| DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM | Mar 19, 2024 | Objectobject-detection | CodeCode Available | 1 |
| Polarity Loss for Zero-shot Object Detection | Nov 22, 2018 | Generalized Zero-Shot Object DetectionMetric Learning | CodeCode Available | 1 |
| Background Learnable Cascade for Zero-Shot Object Detection | Oct 9, 2020 | Generalized Zero-Shot Object DetectionObject | CodeCode Available | 1 |
| SeeDS: Semantic Separable Diffusion Synthesizer for Zero-shot Food Detection | Oct 7, 2023 | DenoisingFood recommendation | CodeCode Available | 1 |