| GLIPv2: Unifying Localization and Vision-Language Understanding | Jun 12, 2022 | 2D Object DetectionContrastive Learning | CodeCode Available | 4 |
| Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation | Jun 6, 2022 | Image SegmentationInstance Segmentation | CodeCode Available | 4 |
| Vision GNN: An Image is Worth Graph of Nodes | Jun 1, 2022 | Image ClassificationObject Detection | CodeCode Available | 4 |
| GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector | May 30, 2022 | Co-Salient Object DetectionObject | CodeCode Available | 4 |
| EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction | May 29, 2022 | Autonomous DrivingCPU | CodeCode Available | 4 |
| Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN | May 27, 2022 | Image ClassificationInstance Segmentation | CodeCode Available | 4 |
| BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation | May 26, 2022 | 3D Multi-Object Tracking3D Object Detection | CodeCode Available | 4 |
| ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models | Apr 19, 2022 | FairnessFew-Shot Image Classification | CodeCode Available | 4 |
| PP-YOLOE: An evolved version of YOLO | Mar 30, 2022 | 2D Object DetectionDense Object Detection | CodeCode Available | 4 |
| DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection | Mar 7, 2022 | Object DetectionReal-Time Object Detection | CodeCode Available | 4 |
| DN-DETR: Accelerate DETR Training by Introducing Query DeNoising | Mar 2, 2022 | DecoderObject Detection | CodeCode Available | 4 |
| Visual Attention Network | Feb 20, 2022 | image-classificationImage Classification | CodeCode Available | 4 |
| Detectron2 Object Detection & Manipulating Images using Cartoonization | Aug 1, 2021 | Autonomous VehiclesData Visualization | CodeCode Available | 4 |
| Deep Residual Learning for Image Recognition | Dec 10, 2015 | Classification | CodeCode Available | 4 |
| Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Jun 10, 2025 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 |
| OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in Infographics | May 23, 2025 | Chart Understandingobject-detection | CodeCode Available | 3 |
| Detect Anything 3D in the Wild | Apr 10, 2025 | 3D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| Playing Non-Embedded Card-Based Games with Reinforcement Learning | Apr 7, 2025 | Board GamesDecision Making | CodeCode Available | 3 |
| Frequency Dynamic Convolution for Dense Image Prediction | Mar 24, 2025 | object-detectionObject Detection | CodeCode Available | 3 |
| Falcon: A Remote Sensing Vision-Language Foundation Model | Mar 14, 2025 | Image Captioningimage-classification | CodeCode Available | 3 |
| Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding | Feb 14, 2025 | 3D Object Detection3D visual grounding | CodeCode Available | 3 |
| SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection | Dec 30, 2024 | object-detectionObject Detection | CodeCode Available | 3 |
| Cubify Anything: Scaling Indoor 3D Object Detection | Dec 5, 2024 | 3D Object DetectionObject | CodeCode Available | 3 |
| Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension | Nov 20, 2024 | GPUMME | CodeCode Available | 3 |
| Data Generation for Hardware-Friendly Post-Training Quantization | Oct 29, 2024 | Data AugmentationGPU | CodeCode Available | 3 |
| Rethinking the Evaluation of Visible and Infrared Image Fusion | Oct 9, 2024 | object-detectionObject Detection | CodeCode Available | 3 |
| RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision | Sep 13, 2024 | Decoderobject-detection | CodeCode Available | 3 |
| A Survey of Camouflaged Object Detection and Beyond | Aug 26, 2024 | Instance SegmentationObject | CodeCode Available | 3 |
| Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community | Aug 17, 2024 | Novel ConceptsObject | CodeCode Available | 3 |
| 5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks | Aug 15, 2024 | image-classificationImage Classification | CodeCode Available | 3 |
| Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving | Aug 14, 2024 | 3D Object Detection3D Object Tracking | CodeCode Available | 3 |
| DeepInteraction++: Multi-Modality Interaction for Autonomous Driving | Aug 9, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation | Aug 9, 2024 | object-detectionObject Detection | CodeCode Available | 3 |
| Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection | Jul 30, 2024 | object-detectionObject Detection | CodeCode Available | 3 |
| Practical Video Object Detection via Feature Selection and Aggregation | Jul 29, 2024 | feature selectionGPU | CodeCode Available | 3 |
| LION: Linear Group RNN for 3D Object Detection in Point Clouds | Jul 25, 2024 | 3D Object DetectionLong-range modeling | CodeCode Available | 3 |
| Relation DETR: Exploring Explicit Position Relation Prior for Object Detection | Jul 16, 2024 | 2D Object Detectionobject-detection | CodeCode Available | 3 |
| TCFormer: Visual Recognition via Token Clustering Transformer | Jul 16, 2024 | Clusteringimage-classification | CodeCode Available | 3 |
| OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer | Jul 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion | Jul 10, 2024 | Object DetectionZero-Shot Object Detection | CodeCode Available | 3 |
| Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines | Jun 20, 2024 | Diversityobject-detection | CodeCode Available | 3 |
| Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | Jun 13, 2024 | Mathobject-detection | CodeCode Available | 3 |
| Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation | Jun 4, 2024 | 2D Object Detection3D Instance Segmentation | CodeCode Available | 3 |
| Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection | Jun 2, 2024 | 3D Object Detectioncross-modal alignment | CodeCode Available | 3 |
| PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | Mar 26, 2024 | Image ClassificationInstance Segmentation | CodeCode Available | 3 |
| RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection | Mar 25, 2024 | 3D Object Detection3D Object Detection (RoI) | CodeCode Available | 3 |
| Multiple Object Tracking as ID Prediction | Mar 25, 2024 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 3 |
| Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement | Mar 24, 2024 | 2D Object DetectionComputational Efficiency | CodeCode Available | 3 |
| IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection | Mar 22, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | Mar 20, 2024 | Aerial Scene ClassificationBuilding change detection for remote sensing images | CodeCode Available | 3 |