| Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics | Apr 25, 2024 | Audio ClassificationTransfer Learning | CodeCode Available | 3 |
| VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks | Mar 1, 2024 | Image ClassificationImage Generation | CodeCode Available | 3 |
| StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis | Jan 23, 2023 | Image GenerationText-to-Image Generation | CodeCode Available | 3 |
| LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding | Apr 25, 2024 | GSM8KHellaSwag | CodeCode Available | 3 |
| Revisiting Image Pyramid Structure for High Resolution Salient Object Detection | Sep 20, 2022 | Dichotomous Image SegmentationObject Detection | CodeCode Available | 3 |
| Travel Time Prediction using Tree-Based Ensembles | May 28, 2020 | Prediction | CodeCode Available | 3 |
| All-atom Diffusion Transformers: Unified generative modelling of molecules and materials | Mar 5, 2025 | AllUnconditional Crystal Generation | CodeCode Available | 3 |
| CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation | Oct 12, 2024 | Conditional Image GenerationGPU | CodeCode Available | 3 |
| MambaGlue: Fast and Robust Local Feature Matching With Mamba | Feb 1, 2025 | Mamba | CodeCode Available | 3 |
| Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation | Jan 29, 2025 | | CodeCode Available | 3 |
| Neural networks for abstraction and reasoning: Towards broad generalization in machines | Feb 5, 2024 | ARCVisual Reasoning | CodeCode Available | 3 |
| Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models | Nov 20, 2023 | Image Generation | CodeCode Available | 3 |
| RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation | Jan 9, 2024 | GPUMath | CodeCode Available | 3 |
| Revisiting VerilogEval: A Year of Improvements in Large-Language Models for Hardware Code Generation | Aug 20, 2024 | Code CompletionCode Generation | CodeCode Available | 3 |
| OneFormer: One Transformer to Rule Universal Image Segmentation | Nov 10, 2022 | Instance SegmentationPanoptic Segmentation | CodeCode Available | 3 |
| The Surprising Effectiveness of Test-Time Training for Few-Shot Learning | Nov 11, 2024 | ARCFew-Shot Learning | CodeCode Available | 3 |
| Prefix-Tuning: Optimizing Continuous Prompts for Generation | Jan 1, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Tina: Tiny Reasoning Models via LoRA | Apr 22, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| Pushing the limits of raw waveform speaker recognition | Mar 16, 2022 | Self-Supervised LearningSpeaker Recognition | CodeCode Available | 3 |
| Discovering and exploring cases of educational source code plagiarism with Dolos | Feb 16, 2024 | | CodeCode Available | 3 |
| UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation | Jun 15, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 3 |
| LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models | Apr 4, 2023 | Arithmetic ReasoningLanguage Modelling | CodeCode Available | 3 |
| PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies | Jun 9, 2022 | 3D Classification3D Part Segmentation | CodeCode Available | 3 |
| Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining | Apr 2, 2024 | Image ReconstructionRain Removal | CodeCode Available | 3 |
| Accelerating Transformer Inference for Translation via Parallel Decoding | May 17, 2023 | Machine TranslationTranslation | CodeCode Available | 3 |
| DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis | May 23, 2024 | Image GenerationMamba | CodeCode Available | 3 |
| GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing | Mar 13, 2025 | Image GenerationLanguage Modeling | CodeCode Available | 3 |
| ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features | Feb 6, 2025 | Image SegmentationSegmentation | CodeCode Available | 3 |
| A Distractor-Aware Memory for Visual Object Tracking with SAM2 | Nov 26, 2024 | Object TrackingSemi-Supervised Video Object Segmentation | CodeCode Available | 3 |
| TAPIP3D: Tracking Any Point in Persistent 3D Geometry | Apr 20, 2025 | 3D geometryDepth And Camera Motion | CodeCode Available | 3 |
| CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation | Jan 2, 2024 | | CodeCode Available | 3 |
| Data Generation for Hardware-Friendly Post-Training Quantization | Oct 29, 2024 | Data AugmentationGPU | CodeCode Available | 3 |
| LLMmap: Fingerprinting For Large Language Models | Jul 22, 2024 | RAG | CodeCode Available | 3 |
| SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition | Feb 27, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 |
| PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | Oct 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback | Mar 25, 2025 | Text to SQLText-To-SQL | CodeCode Available | 3 |
| MagicPIG: LSH Sampling for Efficient LLM Generation | Oct 21, 2024 | CPUGPU | CodeCode Available | 3 |
| MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs | Nov 22, 2024 | image-classificationImage Classification | CodeCode Available | 3 |
| Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task | Sep 6, 2024 | Video Generation | CodeCode Available | 3 |
| What Language Model to Train if You Have One Million GPU Hours? | Oct 27, 2022 | GPULanguage Modeling | CodeCode Available | 3 |
| FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model | Oct 17, 2024 | Computational EfficiencyImage Cropping | CodeCode Available | 3 |
| An Evolved Universal Transformer Memory | Oct 17, 2024 | | CodeCode Available | 3 |
| Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation | Jun 30, 2024 | AllDeblurring | CodeCode Available | 3 |
| DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation | Apr 7, 2025 | 3D geometryRGBD Semantic Segmentation | CodeCode Available | 3 |
| SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation | Apr 15, 2024 | Brain Tumor SegmentationDecoder | CodeCode Available | 3 |
| CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution | Mar 10, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 3 |
| Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels | Feb 21, 2023 | Classification | CodeCode Available | 3 |
| From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step | May 23, 2024 | GSM8K | CodeCode Available | 3 |
| Diffusion Feedback Helps CLIP See Better | Jul 29, 2024 | image-classificationImage Classification | CodeCode Available | 3 |
| HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems | Nov 5, 2024 | HallucinationRAG | CodeCode Available | 3 |