| VkD: Improving Knowledge Distillation using Orthogonal Projections | Jan 1, 2024 | Image GenerationKnowledge Distillation | CodeCode Available | 2 |
| Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator | Dec 20, 2023 | Data Augmentationobject-detection | CodeCode Available | 2 |
| Agent Attention: On the Integration of Softmax and Linear Attention | Dec 14, 2023 | Computational Efficiencyimage-classification | CodeCode Available | 2 |
| Towards Automatic Power Battery Detection: New Challenge, Benchmark Dataset and Baseline | Dec 5, 2023 | Crowd Countingobject-detection | CodeCode Available | 2 |
| Hulk: A Universal Knowledge Translator for Human-Centric Tasks | Dec 4, 2023 | 3D Human Pose EstimationAction Recognition | CodeCode Available | 2 |
| Aligning and Prompting Everything All at Once for Universal Visual Perception | Dec 4, 2023 | AllObject | CodeCode Available | 2 |
| Segment and Caption Anything | Dec 1, 2023 | Caption Generationobject-detection | CodeCode Available | 2 |
| TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models | Dec 1, 2023 | Image ClassificationMulti-Object Tracking | CodeCode Available | 2 |
| TransNeXt: Robust Foveal Visual Perception for Vision Transformers | Nov 28, 2023 | ClassificationDomain Generalization | CodeCode Available | 2 |
| Adapter is All You Need for Tuning Visual Tasks | Nov 25, 2023 | Allimage-classification | CodeCode Available | 2 |
| FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin | Nov 18, 2023 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition | Oct 30, 2023 | Image ClassificationObject Detection | CodeCode Available | 2 |
| Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks | Oct 30, 2023 | Benchmarkingobject-detection | CodeCode Available | 2 |
| GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment | Oct 17, 2023 | AttributeObject | CodeCode Available | 2 |
| UniPAD: A Universal Pre-training Paradigm for Autonomous Driving | Oct 12, 2023 | 3D Object Detection3D Semantic Segmentation | CodeCode Available | 2 |
| CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection | Oct 4, 2023 | 3D Object Detectioncross-modal alignment | CodeCode Available | 2 |
| CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction | Oct 2, 2023 | image-classificationImage Classification | CodeCode Available | 2 |
| You Only Look at Once for Real-time and Generic Multi-Task | Oct 2, 2023 | Autonomous DrivingDrivable Area Detection | CodeCode Available | 2 |
| InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists | Sep 30, 2023 | Depth EstimationImage Generation | CodeCode Available | 2 |
| Detect Everything with Few Examples | Sep 22, 2023 | Binary ClassificationCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization | Sep 20, 2023 | Knowledge Distillationobject-detection | CodeCode Available | 2 |
| RMT: Retentive Networks Meet Vision Transformers | Sep 20, 2023 | Instance Segmentationobject-detection | CodeCode Available | 2 |
| RaTrack: Moving Object Detection and Tracking with 4D Radar Point Cloud | Sep 18, 2023 | Motion EstimationMotion Segmentation | CodeCode Available | 2 |
| DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation | Sep 18, 2023 | 3D geometryDecoder | CodeCode Available | 2 |
| DAT++: Spatially Dynamic Vision Transformer with Deformable Attention | Sep 4, 2023 | Image ClassificationInstance Segmentation | CodeCode Available | 2 |