| ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS | May 29, 2025 | 3DGSGPU | CodeCode Available | 2 | 5 |
| FOCUS: Towards Universal Foreground Segmentation | Jan 9, 2025 | Camouflaged Object SegmentationDefocus Blur Detection | CodeCode Available | 2 | 5 |
| Simple Is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation | Oct 28, 2024 | RAGRetrieval | CodeCode Available | 2 | 5 |
| Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation | Jan 2, 2024 | Audio Generationcross-modal alignment | CodeCode Available | 2 | 5 |
| MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer | Apr 30, 2020 | Cross-Lingual Transfernamed-entity-recognition | CodeCode Available | 2 | 5 |
| TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection | Jan 4, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| T-LoRA: Single Image Diffusion Model Customization Without Overfitting | Jul 8, 2025 | | CodeCode Available | 2 | 5 |
| Uni-Sign: Toward Unified Sign Language Understanding at Scale | Jan 25, 2025 | Computational EfficiencyGloss-free Sign Language Translation | CodeCode Available | 2 | 5 |
| Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation | Dec 5, 2024 | Image SegmentationOpen Vocabulary Semantic Segmentation | CodeCode Available | 2 | 5 |
| Autoregressive Image Generation with Randomized Parallel Decoding | Mar 13, 2025 | Conditional Image GenerationImage Generation | CodeCode Available | 2 | 5 |
| Contrastive Audio-Visual Masked Autoencoder | Oct 2, 2022 | Audio ClassificationAudio Tagging | CodeCode Available | 2 | 5 |
| Content-Aware Transformer for All-in-one Image Restoration | Apr 7, 2025 | AllImage Restoration | CodeCode Available | 2 | 5 |
| Liger: Linearizing Large Language Models to Gated Recurrent Structures | Mar 3, 2025 | | CodeCode Available | 2 | 5 |
| T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations | Jan 15, 2023 | Motion GenerationMotion Synthesis | CodeCode Available | 2 | 5 |
| LongForm: Effective Instruction Tuning with Reverse Instructions | Apr 17, 2023 | Long Form Question AnsweringNews Generation | CodeCode Available | 2 | 5 |
| DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection | May 16, 2024 | Adversarial AttackFace Recognition | CodeCode Available | 2 | 5 |
| ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation | Jan 23, 2024 | Anomaly LocalizationAnomaly Segmentation | CodeCode Available | 2 | 5 |
| CellViT: Vision Transformers for Precise Cell Segmentation and Classification | Jun 27, 2023 | Cell DetectionCell Segmentation | CodeCode Available | 2 | 5 |
| Your Transformer is Secretly Linear | May 19, 2024 | | CodeCode Available | 2 | 5 |
| Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion | Oct 1, 2023 | DenoisingImage Generation | CodeCode Available | 2 | 5 |
| Diffusion-based Visual Anagram as Multi-task Learning | Dec 3, 2024 | DenoisingMulti-Task Learning | CodeCode Available | 2 | 5 |
| Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems | Jul 15, 2024 | 3D ReconstructionMeta-Learning | CodeCode Available | 2 | 5 |
| AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench | Jul 3, 2025 | Navigate | CodeCode Available | 2 | 5 |
| Big Transfer (BiT): General Visual Representation Learning | Dec 24, 2019 | Few-Shot LearningFine-Grained Image Classification | CodeCode Available | 2 | 5 |
| HourVideo: 1-Hour Video-Language Understanding | Nov 7, 2024 | Benchmarkingcounterfactual | CodeCode Available | 2 | 5 |
| Learnware of Language Models: Specialized Small Language Models Can Do Big | May 19, 2025 | Privacy PreservingQuestion Answering | CodeCode Available | 2 | 5 |
| An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning | Mar 23, 2024 | Federated LearningTransfer Learning | CodeCode Available | 2 | 5 |
| VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation | Mar 14, 2024 | Image SegmentationMamba | CodeCode Available | 2 | 5 |
| Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints | Dec 9, 2022 | Mixture-of-Experts | CodeCode Available | 2 | 5 |
| RefMask3D: Language-Guided Transformer for 3D Referring Segmentation | Jul 25, 2024 | 3D visual groundingImage Segmentation | CodeCode Available | 2 | 5 |
| Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation | Sep 12, 2022 | Robot ManipulationRobot Manipulation Generalization | CodeCode Available | 2 | 5 |
| Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Lightweight, Pre-trained Transformers for Remote Sensing Timeseries | Apr 27, 2023 | Crop ClassificationSelf-Supervised Learning | CodeCode Available | 2 | 5 |
| MOSE: A New Dataset for Video Object Segmentation in Complex Scenes | Feb 3, 2023 | ObjectSegmentation | CodeCode Available | 2 | 5 |
| SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes | Apr 11, 2023 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 2 | 5 |
| Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation | Nov 9, 2022 | Audio ClassificationAudio Tagging | CodeCode Available | 2 | 5 |
| Attention Prompting on Image for Large Vision-Language Models | Sep 25, 2024 | MM-VetVisual Prompting | CodeCode Available | 2 | 5 |
| Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training | Nov 25, 2024 | object-detectionObject Detection | CodeCode Available | 2 | 5 |
| Brain Tumour Removing and Missing Modality Generation using 3D WDM | Nov 7, 2024 | GPUPrediction | CodeCode Available | 2 | 5 |
| Center-based 3D Object Detection and Tracking | Jun 19, 2020 | 3D Multi-Object Tracking3D Object Detection | CodeCode Available | 2 | 5 |
| Vision6D: 3D-to-2D Interactive Visualization and Annotation Tool for 6D Pose Estimation | Apr 21, 2025 | 6D Pose EstimationPose Estimation | CodeCode Available | 2 | 5 |
| SlimSAM: 0.1% Data Makes Segment Anything Slim | Dec 8, 2023 | | CodeCode Available | 2 | 5 |
| Personality Alignment of Large Language Models | Aug 21, 2024 | Personality Alignment | CodeCode Available | 2 | 5 |
| STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction | Apr 28, 2025 | GPU | CodeCode Available | 2 | 5 |
| Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking | Dec 20, 2024 | MambaObject Tracking | CodeCode Available | 2 | 5 |
| Fast Training of Diffusion Models with Masked Transformers | Jun 15, 2023 | DecoderDenoising | CodeCode Available | 2 | 5 |
| SkiROS2: A skill-based Robot Control Platform for ROS | Jun 29, 2023 | SchedulingTask Planning | CodeCode Available | 2 | 5 |
| Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding | May 23, 2022 | | CodeCode Available | 2 | 5 |
| VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models | Oct 10, 2024 | Math | CodeCode Available | 2 | 5 |
| Prodigy: An Expeditiously Adaptive Parameter-Free Learner | Jun 9, 2023 | | CodeCode Available | 2 | 5 |