| YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information | Feb 21, 2024 | object-detectionObject Detection | CodeCode Available | 16 |
| YOLOv10: Real-Time End-to-End Object Detection | May 23, 2024 | 2D Object DetectionData Augmentation | CodeCode Available | 11 |
| LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection | Jun 5, 2024 | Decoderobject-detection | CodeCode Available | 9 |
| YOLO-World: Real-Time Open-Vocabulary Object Detection | Jan 30, 2024 | Instance SegmentationLanguage Modeling | CodeCode Available | 9 |
| Perception Encoder: The best visual embeddings are not at the output of the network | Apr 17, 2025 | Depth EstimationLanguage Modeling | CodeCode Available | 8 |
| DETRs Beat YOLOs on Real-time Object Detection | Apr 17, 2023 | 2D Object DetectionDecoder | CodeCode Available | 8 |
| DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis | Jun 2, 2022 | Document Layout AnalysisObject Detection | CodeCode Available | 8 |
| Visual-RFT: Visual Reinforcement Fine-Tuning | Mar 3, 2025 | Few-Shot Object DetectionFine-Grained Image Classification | CodeCode Available | 7 |
| MambaVision: A Hybrid Mamba-Transformer Vision Backbone | Jul 10, 2024 | Image ClassificationInstance Segmentation | CodeCode Available | 7 |
| Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection | May 16, 2024 | Edge-computingFew-Shot Object Detection | CodeCode Available | 7 |
| MambaOut: Do We Really Need Mamba for Vision? | May 13, 2024 | image-classificationImage Classification | CodeCode Available | 7 |
| T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | Mar 21, 2024 | Contrastive LearningDescriptive | CodeCode Available | 7 |
| YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors | Jul 6, 2022 | 2D Object DetectionGPU | CodeCode Available | 7 |
| Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution | Jul 12, 2023 | FairnessImage Classification | CodeCode Available | 6 |
| YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception | Jun 21, 2025 | Computational Efficiencyobject-detection | CodeCode Available | 5 |
| DEIM: DETR with Improved Matching for Fast Convergence | Dec 5, 2024 | Data AugmentationGPU | CodeCode Available | 5 |
| DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Nov 21, 2024 | Long-tailed Object DetectionObject | CodeCode Available | 5 |
| YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary | Oct 20, 2024 | object-detectionObject Detection | CodeCode Available | 5 |
| SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More | Aug 8, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 5 |
| Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head | Mar 11, 2024 | Object DetectionOpen-vocabulary object detection | CodeCode Available | 5 |
| YOLOR-Based Multi-Task Learning | Sep 29, 2023 | Image CaptioningInstance Segmentation | CodeCode Available | 5 |
| Infinite Photorealistic Worlds using Procedural Generation | Jun 15, 2023 | 3D Reconstructionobject-detection | CodeCode Available | 5 |
| Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement | Mar 12, 2023 | Image EnhancementLow-light Image Deblurring and Enhancement | CodeCode Available | 5 |
| Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection | Mar 9, 2023 | DecoderObject Detection | CodeCode Available | 5 |
| EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design | Feb 1, 2023 | GPUobject-detection | CodeCode Available | 5 |