| ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions | Mar 13, 2024 | Instance SegmentationObject Detection | CodeCode Available | 3 |
| VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks | Mar 1, 2024 | Image ClassificationImage Generation | CodeCode Available | 3 |
| Theoretically Achieving Continuous Representation of Oriented Bounding Boxes | Feb 29, 2024 | Fairnessobject-detection | CodeCode Available | 3 |
| State Space Models for Event Cameras | Feb 23, 2024 | Event-based visionObject Detection | CodeCode Available | 3 |
| Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline | Jan 1, 2024 | Crowd Countingobject-detection | CodeCode Available | 3 |
| General Object Foundation Model for Images and Videos at Scale | Dec 14, 2023 | Instance SegmentationLong-tail Video Object Segmentation | CodeCode Available | 3 |
| AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One | Dec 10, 2023 | AllBenchmarking | CodeCode Available | 3 |
| UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition | Nov 27, 2023 | Image ClassificationObject Detection | CodeCode Available | 3 |
| Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection | Nov 13, 2023 | 3D Object Detectionobject-detection | CodeCode Available | 3 |
| Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection | Oct 24, 2023 | 3D Object Detectionobject-detection | CodeCode Available | 3 |
| MagicDrive: Street View Generation with Diverse 3D Geometry Control | Oct 4, 2023 | 3D geometry3D Object Detection | CodeCode Available | 3 |
| How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection | Aug 25, 2023 | Object Detection | CodeCode Available | 3 |
| SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More | Apr 18, 2023 | General KnowledgeImage Segmentation | CodeCode Available | 3 |
| Geometric-aware Pretraining for Vision-centric 3D Object Detection | Apr 6, 2023 | 3D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation | Mar 22, 2023 | 3D Object Detection6D Pose Estimation using RGB | CodeCode Available | 3 |
| Cross-Modal Causal Intervention for Medical Report Generation | Mar 16, 2023 | Medical Report Generationobject-detection | CodeCode Available | 3 |
| SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving | Mar 16, 2023 | 3D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| Universal Instance Perception as Object Discovery and Retrieval | Mar 12, 2023 | Described Object DetectionGeneralized Referring Expression Comprehension | CodeCode Available | 3 |
| Cut and Learn for Unsupervised Object Detection and Instance Segmentation | Jan 26, 2023 | Instance Segmentationobject-detection | CodeCode Available | 3 |
| Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling | Jan 9, 2023 | 2D Object DetectionContrastive Learning | CodeCode Available | 3 |
| Cross Modal Transformer: Towards Fast and Robust 3D Object Detection | Jan 3, 2023 | 3D Object Detectionobject-detection | CodeCode Available | 3 |
| ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders | Jan 2, 2023 | Object DetectionRepresentation Learning | CodeCode Available | 3 |
| DETRs with Collaborative Hybrid Assignments Training | Nov 22, 2022 | DecoderInstance Segmentation | CodeCode Available | 3 |
| Vision-Language Pre-training: Basics, Recent Advances, and Future Trends | Oct 17, 2022 | Few-Shot LearningImage Captioning | CodeCode Available | 3 |
| Revisiting Image Pyramid Structure for High Resolution Salient Object Detection | Sep 20, 2022 | Dichotomous Image SegmentationObject Detection | CodeCode Available | 3 |
| OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network | Sep 10, 2022 | Continual LearningObject | CodeCode Available | 3 |
| Vision Transformers: From Semantic Segmentation to Dense Prediction | Jul 19, 2022 | image-classificationImage Classification | CodeCode Available | 3 |
| Separable Self-attention for Mobile Vision Transformers | Jun 6, 2022 | Image ClassificationObject Detection | CodeCode Available | 3 |
| PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images | Jun 2, 2022 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 |
| TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving | May 31, 2022 | Autonomous DrivingCARLA longest6 | CodeCode Available | 3 |
| Vision Transformer Adapter for Dense Predictions | May 17, 2022 | Instance SegmentationObject Detection | CodeCode Available | 3 |
| MaxViT: Multi-Axis Vision Transformer | Apr 4, 2022 | image-classificationImage Classification | CodeCode Available | 3 |
| BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection | Mar 31, 2022 | 3D Object Detectionobject-detection | CodeCode Available | 3 |
| Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking | Mar 27, 2022 | CPUMulti-Object Tracking | CodeCode Available | 3 |
| PETR: Position Embedding Transformation for Multi-View 3D Object Detection | Mar 10, 2022 | 3D Object DetectionObject | CodeCode Available | 3 |
| Workshop on Autonomous Driving at CVPR 2021: Technical Report for Streaming Perception Challenge | Jul 27, 2021 | 2D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| XCiT: Cross-Covariance Image Transformers | Jun 17, 2021 | image-classificationImage Classification | CodeCode Available | 3 |
| Robust and Accurate Object Detection via Adversarial Learning | Mar 23, 2021 | AutoMLData Augmentation | CodeCode Available | 3 |
| A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit | Jan 25, 2021 | Objectobject-detection | CodeCode Available | 3 |
| Deformable DETR: Deformable Transformers for End-to-End Object Detection | Oct 8, 2020 | 2D Object DetectionObject Detection | CodeCode Available | 3 |
| A Survey on Performance Metrics for Object-Detection Algorithms | Jul 21, 2020 | BenchmarkingObject | CodeCode Available | 3 |
| Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection | Jun 8, 2020 | Dense Object DetectionGeneral Classification | CodeCode Available | 3 |
| U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection | May 18, 2020 | Dichotomous Image SegmentationGPU | CodeCode Available | 3 |
| YOLOv4: Optimal Speed and Accuracy of Object Detection | Apr 23, 2020 | BIG-bench Machine LearningData Augmentation | CodeCode Available | 3 |
| ResNeSt: Split-Attention Networks | Apr 19, 2020 | image-classificationImage Classification | CodeCode Available | 3 |
| Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection | Dec 5, 2019 | Objectobject-detection | CodeCode Available | 3 |
| EfficientDet: Scalable and Efficient Object Detection | Nov 20, 2019 | AutoMLObject | CodeCode Available | 3 |
| Bag of Freebies for Training Object Detection Neural Networks | Feb 11, 2019 | General Classificationimage-classification | CodeCode Available | 3 |
| MMLSpark: Unifying Machine Learning Ecosystems at Massive Scales | Oct 20, 2018 | BIG-bench Machine LearningDistributed Computing | CodeCode Available | 3 |
| Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields | Nov 24, 2016 | 2D Human Pose Estimation2D Pose Estimation | CodeCode Available | 3 |