| DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion | Mar 1, 2024 | Objectobject-detection | CodeCode Available | 2 |
| FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything | Feb 29, 2024 | 3D Object ReconstructionInstance Segmentation | CodeCode Available | 2 |
| DEYO: DETR with YOLO for End-to-End Object Detection | Feb 26, 2024 | DecoderGPU | CodeCode Available | 2 |
| EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection | Feb 23, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition | Feb 22, 2024 | Image-level Supervised Instance Segmentationobject-detection | CodeCode Available | 2 |
| MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection | Feb 18, 2024 | 3D Object DetectionDataset Generation | CodeCode Available | 2 |
| YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection | Feb 14, 2024 | Fracture detectionmedical image detection | CodeCode Available | 2 |
| FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models | Feb 7, 2024 | Instance SegmentationObject | CodeCode Available | 2 |
| Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection | Feb 6, 2024 | 3D Object DetectionDenoising | CodeCode Available | 2 |
| YOLOPoint Joint Keypoint and Object Detection | Feb 6, 2024 | Objectobject-detection | CodeCode Available | 2 |
| HASSOD: Hierarchical Adaptive Self-Supervised Object Detection | Feb 5, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector | Feb 5, 2024 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design | Jan 29, 2024 | CPUGPU | CodeCode Available | 2 |
| MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection | Jan 29, 2024 | 3D Object Detectionobject-detection | CodeCode Available | 2 |
| LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection | Jan 29, 2024 | 3D Object DetectionAutonomous Vehicles | CodeCode Available | 2 |
| Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration | Jan 23, 2024 | 3D Semantic SegmentationAutonomous Driving | CodeCode Available | 2 |
| Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis | Jan 22, 2024 | Document Layout AnalysisDocument Summarization | CodeCode Available | 2 |
| Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object Detection | Jan 19, 2024 | Multispectral Object DetectionObject | CodeCode Available | 2 |
| A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting | Jan 18, 2024 | Instance SegmentationInteractive Segmentation | CodeCode Available | 2 |
| Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model | Jan 17, 2024 | GPUImage Classification | CodeCode Available | 2 |
| Fine-Grained Prototypes Distillation for Few-Shot Object Detection | Jan 15, 2024 | Few-Shot Object DetectionMeta-Learning | CodeCode Available | 2 |
| WidthFormer: Toward Efficient Transformer-based BEV View Transformation | Jan 8, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM | Jan 8, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| MS-DETR: Efficient DETR Training with Mixed Supervision | Jan 8, 2024 | DecoderObject | CodeCode Available | 2 |
| Exploring Orthogonality in Open World Object Detection | Jan 1, 2024 | Incremental LearningObject | CodeCode Available | 2 |
| VkD: Improving Knowledge Distillation using Orthogonal Projections | Jan 1, 2024 | Image GenerationKnowledge Distillation | CodeCode Available | 2 |
| Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator | Dec 20, 2023 | Data Augmentationobject-detection | CodeCode Available | 2 |
| Agent Attention: On the Integration of Softmax and Linear Attention | Dec 14, 2023 | Computational Efficiencyimage-classification | CodeCode Available | 2 |
| Towards Automatic Power Battery Detection: New Challenge, Benchmark Dataset and Baseline | Dec 5, 2023 | Crowd Countingobject-detection | CodeCode Available | 2 |
| Hulk: A Universal Knowledge Translator for Human-Centric Tasks | Dec 4, 2023 | 3D Human Pose EstimationAction Recognition | CodeCode Available | 2 |
| Aligning and Prompting Everything All at Once for Universal Visual Perception | Dec 4, 2023 | AllObject | CodeCode Available | 2 |
| Segment and Caption Anything | Dec 1, 2023 | Caption Generationobject-detection | CodeCode Available | 2 |
| TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models | Dec 1, 2023 | Image ClassificationMulti-Object Tracking | CodeCode Available | 2 |
| TransNeXt: Robust Foveal Visual Perception for Vision Transformers | Nov 28, 2023 | ClassificationDomain Generalization | CodeCode Available | 2 |
| Adapter is All You Need for Tuning Visual Tasks | Nov 25, 2023 | Allimage-classification | CodeCode Available | 2 |
| FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin | Nov 18, 2023 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition | Oct 30, 2023 | Image ClassificationObject Detection | CodeCode Available | 2 |
| Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks | Oct 30, 2023 | Benchmarkingobject-detection | CodeCode Available | 2 |
| GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment | Oct 17, 2023 | AttributeObject | CodeCode Available | 2 |
| UniPAD: A Universal Pre-training Paradigm for Autonomous Driving | Oct 12, 2023 | 3D Object Detection3D Semantic Segmentation | CodeCode Available | 2 |
| CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection | Oct 4, 2023 | 3D Object Detectioncross-modal alignment | CodeCode Available | 2 |
| CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction | Oct 2, 2023 | image-classificationImage Classification | CodeCode Available | 2 |
| You Only Look at Once for Real-time and Generic Multi-Task | Oct 2, 2023 | Autonomous DrivingDrivable Area Detection | CodeCode Available | 2 |
| InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists | Sep 30, 2023 | Depth EstimationImage Generation | CodeCode Available | 2 |
| Detect Everything with Few Examples | Sep 22, 2023 | Binary ClassificationCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization | Sep 20, 2023 | Knowledge Distillationobject-detection | CodeCode Available | 2 |
| RMT: Retentive Networks Meet Vision Transformers | Sep 20, 2023 | Instance Segmentationobject-detection | CodeCode Available | 2 |
| RaTrack: Moving Object Detection and Tracking with 4D Radar Point Cloud | Sep 18, 2023 | Motion EstimationMotion Segmentation | CodeCode Available | 2 |
| DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation | Sep 18, 2023 | 3D geometryDecoder | CodeCode Available | 2 |
| DAT++: Spatially Dynamic Vision Transformer with Deformable Attention | Sep 4, 2023 | Image ClassificationInstance Segmentation | CodeCode Available | 2 |