| RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation | Nov 20, 2024 | Image Generationobject-detection | CodeCode Available | 2 |
| GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving | Nov 19, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion | Nov 13, 2024 | 3D Object DetectionDenoising | CodeCode Available | 2 |
| Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation | Nov 4, 2024 | Earth ObservationObject | CodeCode Available | 2 |
| ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images | Oct 31, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors | Oct 25, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model | Oct 22, 2024 | DecoderInstance Segmentation | CodeCode Available | 2 |
| Multiview Scene Graph | Oct 15, 2024 | DecoderObject | CodeCode Available | 2 |
| Open World Object Detection: A Survey | Oct 15, 2024 | Incremental LearningObject | CodeCode Available | 2 |
| PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection | Oct 10, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection | Oct 2, 2024 | 3DGS3D Object Detection | CodeCode Available | 2 |
| HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes | Sep 30, 2024 | Objectobject-detection | CodeCode Available | 2 |
| DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction | Sep 30, 2024 | 3D Object Detection3D Semantic Occupancy Prediction | CodeCode Available | 2 |
| A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation | Sep 27, 2024 | Exemplar-Free CountingFew-shot Object Counting and Detection | CodeCode Available | 2 |
| Source-Free Domain Adaptation for YOLO Object Detection | Sep 25, 2024 | Domain AdaptationModel Selection | CodeCode Available | 2 |
| RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework | Sep 18, 2024 | 3D Multi-Object Tracking3D Object Detection | CodeCode Available | 2 |
| One missing piece in Vision and Language: A Survey on Comics Understanding | Sep 14, 2024 | document understandingimage-classification | CodeCode Available | 2 |
| UniDet3D: Multi-dataset Indoor 3D Object Detection | Sep 6, 2024 | 3D Object DetectionObject | CodeCode Available | 2 |
| UTrack: Multi-Object Tracking with Uncertain Detections | Aug 30, 2024 | Autonomous DrivingMulti-Object Tracking | CodeCode Available | 2 |
| RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments | Aug 28, 2024 | Autonomous DrivingAutonomous Navigation | CodeCode Available | 2 |
| GOReloc: Graph-based Object-Level Relocalization for Visual SLAM | Aug 15, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection | Aug 8, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications | Aug 7, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection | Aug 7, 2024 | 3D Object DetectionAutonomous Navigation | CodeCode Available | 2 |
| Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Aug 2, 2024 | cross-modal alignmentMultiple Object Tracking | CodeCode Available | 2 |
| Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images | Jul 29, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection | Jul 23, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| COALA: A Practical and Vision-Centric Federated Learning Platform | Jul 23, 2024 | BenchmarkingContinual Learning | CodeCode Available | 2 |
| PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects | Jul 23, 2024 | Instance SegmentationObject | CodeCode Available | 2 |
| ESOD: Efficient Small Object Detection on High-Resolution Images | Jul 23, 2024 | GPUObject | CodeCode Available | 2 |
| GroupMamba: Efficient Group-Based Visual State Space Model | Jul 18, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval | Jul 17, 2024 | DecoderImage Enhancement | CodeCode Available | 2 |
| Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes | Jul 16, 2024 | Human Instance SegmentationInstance Segmentation | CodeCode Available | 2 |
| LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction | Jul 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection | Jul 15, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset | Jul 14, 2024 | 3D Object DetectionMultispectral Object Detection | CodeCode Available | 2 |
| Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation | Jul 11, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention | Jul 6, 2024 | Classificationobject-detection | CodeCode Available | 2 |
| Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection | Jul 5, 2024 | Novel Object Detectionobject-detection | CodeCode Available | 2 |
| SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry | Jul 5, 2024 | Benchmarkingobject-detection | CodeCode Available | 2 |
| SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding | Jul 3, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection | Jul 1, 2024 | Objectobject-detection | CodeCode Available | 2 |
| The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Jun 26, 2024 | Action LocalizationMoment Retrieval | CodeCode Available | 2 |
| LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection | Jun 20, 2024 | Computational EfficiencyObject | CodeCode Available | 2 |
| Scaling Efficient Masked Image Modeling on Large Remote Sensing Dataset | Jun 17, 2024 | Aerial Scene ClassificationDiversity | CodeCode Available | 2 |
| Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection | Jun 15, 2024 | 3D Object DetectionComputational Efficiency | CodeCode Available | 2 |
| EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models | Jun 14, 2024 | 3D Object Detection3D Reconstruction | CodeCode Available | 2 |
| BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection | Jun 13, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery | Jun 13, 2024 | Graph GenerationObject | CodeCode Available | 2 |
| EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network | Jun 11, 2024 | 3D Object DetectionActive Learning | CodeCode Available | 2 |