| ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions | Mar 13, 2024 | Instance SegmentationObject Detection | CodeCode Available | 3 |
| VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks | Mar 1, 2024 | Image ClassificationImage Generation | CodeCode Available | 3 |
| Theoretically Achieving Continuous Representation of Oriented Bounding Boxes | Feb 29, 2024 | Fairnessobject-detection | CodeCode Available | 3 |
| State Space Models for Event Cameras | Feb 23, 2024 | Event-based visionObject Detection | CodeCode Available | 3 |
| Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline | Jan 1, 2024 | Crowd Countingobject-detection | CodeCode Available | 3 |
| General Object Foundation Model for Images and Videos at Scale | Dec 14, 2023 | Instance SegmentationLong-tail Video Object Segmentation | CodeCode Available | 3 |
| AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One | Dec 10, 2023 | AllBenchmarking | CodeCode Available | 3 |
| UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition | Nov 27, 2023 | Image ClassificationObject Detection | CodeCode Available | 3 |
| Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection | Nov 13, 2023 | 3D Object Detectionobject-detection | CodeCode Available | 3 |
| Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection | Oct 24, 2023 | 3D Object Detectionobject-detection | CodeCode Available | 3 |
| MagicDrive: Street View Generation with Diverse 3D Geometry Control | Oct 4, 2023 | 3D geometry3D Object Detection | CodeCode Available | 3 |
| How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection | Aug 25, 2023 | Object Detection | CodeCode Available | 3 |
| SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More | Apr 18, 2023 | General KnowledgeImage Segmentation | CodeCode Available | 3 |
| Geometric-aware Pretraining for Vision-centric 3D Object Detection | Apr 6, 2023 | 3D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation | Mar 22, 2023 | 3D Object Detection6D Pose Estimation using RGB | CodeCode Available | 3 |
| Cross-Modal Causal Intervention for Medical Report Generation | Mar 16, 2023 | Medical Report Generationobject-detection | CodeCode Available | 3 |
| SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving | Mar 16, 2023 | 3D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| Universal Instance Perception as Object Discovery and Retrieval | Mar 12, 2023 | Described Object DetectionGeneralized Referring Expression Comprehension | CodeCode Available | 3 |
| Cut and Learn for Unsupervised Object Detection and Instance Segmentation | Jan 26, 2023 | Instance Segmentationobject-detection | CodeCode Available | 3 |
| Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling | Jan 9, 2023 | 2D Object DetectionContrastive Learning | CodeCode Available | 3 |
| Cross Modal Transformer: Towards Fast and Robust 3D Object Detection | Jan 3, 2023 | 3D Object Detectionobject-detection | CodeCode Available | 3 |
| ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders | Jan 2, 2023 | Object DetectionRepresentation Learning | CodeCode Available | 3 |
| DETRs with Collaborative Hybrid Assignments Training | Nov 22, 2022 | DecoderInstance Segmentation | CodeCode Available | 3 |
| Vision-Language Pre-training: Basics, Recent Advances, and Future Trends | Oct 17, 2022 | Few-Shot LearningImage Captioning | CodeCode Available | 3 |
| Revisiting Image Pyramid Structure for High Resolution Salient Object Detection | Sep 20, 2022 | Dichotomous Image SegmentationObject Detection | CodeCode Available | 3 |