| YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information | Feb 21, 2024 | object-detectionObject Detection | CodeCode Available | 16 |
| YOLOv10: Real-Time End-to-End Object Detection | May 23, 2024 | 2D Object DetectionData Augmentation | CodeCode Available | 11 |
| YOLO-World: Real-Time Open-Vocabulary Object Detection | Jan 30, 2024 | Instance SegmentationLanguage Modeling | CodeCode Available | 9 |
| LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection | Jun 5, 2024 | Decoderobject-detection | CodeCode Available | 9 |
| DETRs Beat YOLOs on Real-time Object Detection | Apr 17, 2023 | 2D Object DetectionDecoder | CodeCode Available | 8 |
| DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis | Jun 2, 2022 | Document Layout AnalysisObject Detection | CodeCode Available | 8 |
| Perception Encoder: The best visual embeddings are not at the output of the network | Apr 17, 2025 | Depth EstimationLanguage Modeling | CodeCode Available | 8 |
| MambaOut: Do We Really Need Mamba for Vision? | May 13, 2024 | image-classificationImage Classification | CodeCode Available | 7 |
| YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors | Jul 6, 2022 | 2D Object DetectionGPU | CodeCode Available | 7 |
| MambaVision: A Hybrid Mamba-Transformer Vision Backbone | Jul 10, 2024 | Image ClassificationInstance Segmentation | CodeCode Available | 7 |
| T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | Mar 21, 2024 | Contrastive LearningDescriptive | CodeCode Available | 7 |
| Visual-RFT: Visual Reinforcement Fine-Tuning | Mar 3, 2025 | Few-Shot Object DetectionFine-Grained Image Classification | CodeCode Available | 7 |
| Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection | May 16, 2024 | Edge-computingFew-Shot Object Detection | CodeCode Available | 7 |
| Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution | Jul 12, 2023 | FairnessImage Classification | CodeCode Available | 6 |
| Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection | Feb 14, 2022 | Objectobject-detection | CodeCode Available | 5 |
| YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications | Sep 7, 2022 | GPUObject Detection | CodeCode Available | 5 |
| Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head | Mar 11, 2024 | Object DetectionOpen-vocabulary object detection | CodeCode Available | 5 |
| Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement | Mar 12, 2023 | Image EnhancementLow-light Image Deblurring and Enhancement | CodeCode Available | 5 |
| YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception | Jun 21, 2025 | Computational Efficiencyobject-detection | CodeCode Available | 5 |
| EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design | Feb 1, 2023 | GPUobject-detection | CodeCode Available | 5 |
| DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Nov 21, 2024 | Long-tailed Object DetectionObject | CodeCode Available | 5 |
| DEIM: DETR with Improved Matching for Fast Convergence | Dec 5, 2024 | Data AugmentationGPU | CodeCode Available | 5 |
| YOLOv6 v3.0: A Full-Scale Reloading | Jan 13, 2023 | GPUObject Detection | CodeCode Available | 5 |
| SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More | Aug 8, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 5 |
| Infinite Photorealistic Worlds using Procedural Generation | Jun 15, 2023 | 3D Reconstructionobject-detection | CodeCode Available | 5 |
| Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection | Mar 9, 2023 | DecoderObject Detection | CodeCode Available | 5 |
| YOLOR-Based Multi-Task Learning | Sep 29, 2023 | Image CaptioningInstance Segmentation | CodeCode Available | 5 |
| YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary | Oct 20, 2024 | object-detectionObject Detection | CodeCode Available | 5 |
| A ConvNet for the 2020s | Jan 10, 2022 | ClassificationDomain Generalization | CodeCode Available | 5 |
| GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector | May 30, 2022 | Co-Salient Object DetectionObject | CodeCode Available | 4 |
| Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement | Mar 9, 2025 | Domain GeneralizationObject Detection | CodeCode Available | 4 |
| FG-CLIP: Fine-Grained Visual and Textual Alignment | May 8, 2025 | Image-text Retrievalobject-detection | CodeCode Available | 4 |
| BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation | May 26, 2022 | 3D Multi-Object Tracking3D Object Detection | CodeCode Available | 4 |
| SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection | Mar 11, 2024 | 2D Object Detection2k | CodeCode Available | 4 |
| Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection | Jan 7, 2025 | Objectobject-detection | CodeCode Available | 4 |
| RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark | Jan 8, 2025 | object-detectionObject Detection | CodeCode Available | 4 |
| ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models | Apr 19, 2022 | FairnessFew-Shot Image Classification | CodeCode Available | 4 |
| RTMDet: An Empirical Study of Designing Real-Time Object Detectors | Dec 14, 2022 | GPUInstance Segmentation | CodeCode Available | 4 |
| EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything | Dec 1, 2023 | Decoderimage-classification | CodeCode Available | 4 |
| EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction | May 29, 2022 | Autonomous DrivingCPU | CodeCode Available | 4 |
| OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels | Feb 27, 2025 | Image ClassificationInstance Segmentation | CodeCode Available | 4 |
| Detectron2 Object Detection & Manipulating Images using Cartoonization | Aug 1, 2021 | Autonomous VehiclesData Visualization | CodeCode Available | 4 |
| DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection | Mar 7, 2022 | Object DetectionReal-Time Object Detection | CodeCode Available | 4 |
| Mamba YOLO: A Simple Baseline for Object Detection with State Space Model | Jun 9, 2024 | GPUMamba | CodeCode Available | 4 |
| DiffusionDet: Diffusion Model for Object Detection | Nov 17, 2022 | Denoisingmodel | CodeCode Available | 4 |
| OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics | Jan 22, 2024 | object-detectionObject Detection | CodeCode Available | 4 |
| Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation | Jun 6, 2022 | Image SegmentationInstance Segmentation | CodeCode Available | 4 |
| Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN | May 27, 2022 | Image ClassificationInstance Segmentation | CodeCode Available | 4 |
| Memory-aided Contrastive Consensus Learning for Co-salient Object Detection | Feb 28, 2023 | Co-Salient Object Detectionobject-detection | CodeCode Available | 4 |
| DAMO-YOLO : A Report on Real-Time Object Detection Design | Nov 23, 2022 | CPUNeural Architecture Search | CodeCode Available | 4 |