| MagicPIG: LSH Sampling for Efficient LLM Generation | Oct 21, 2024 | CPUGPU | CodeCode Available | 3 |
| APOLLO: SGD-like Memory, AdamW-level Performance | Dec 6, 2024 | GPUQuantization | CodeCode Available | 3 |
| MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Apr 8, 2024 | GPUMultiple-choice | CodeCode Available | 3 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 |
| LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale | Aug 10, 2024 | GPULanguage Modelling | CodeCode Available | 3 |
| LiteGS: A High-Performance Modular Framework for Gaussian Splatting Training | Mar 3, 2025 | 3DGSGPU | CodeCode Available | 3 |
| LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture | Sep 4, 2024 | GPUMamba | CodeCode Available | 3 |
| GPU-accelerated Evolutionary Multiobjective Optimization Using Tensorized RVEA | Apr 1, 2024 | GPUMultiobjective Optimization | CodeCode Available | 3 |
| GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-III | Apr 8, 2025 | Computational EfficiencyCPU | CodeCode Available | 3 |
| Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward | May 18, 2025 | GPUGraph Matching | CodeCode Available | 3 |
| Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services | Apr 25, 2024 | GPU | CodeCode Available | 3 |
| CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation | Oct 12, 2024 | Conditional Image GenerationGPU | CodeCode Available | 3 |
| Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces | Feb 1, 2024 | Computational EfficiencyGPU | CodeCode Available | 3 |
| Data Generation for Hardware-Friendly Post-Training Quantization | Oct 29, 2024 | Data AugmentationGPU | CodeCode Available | 3 |
| Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models | Jan 9, 2024 | GPU | CodeCode Available | 3 |
| The Mamba in the Llama: Distilling and Accelerating Hybrid Models | Aug 27, 2024 | GPULanguage Modeling | CodeCode Available | 3 |
| KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | Jan 31, 2024 | GPUQuantization | CodeCode Available | 3 |
| Cramming: Training a Language Model on a Single GPU in One Day | Dec 28, 2022 | GPULanguage Modeling | CodeCode Available | 3 |
| Dataset Distillation with Neural Characteristic Function: A Minmax Perspective | Jan 1, 2025 | Computational EfficiencyDataset Distillation | CodeCode Available | 3 |
| TorchCP: A Python Library for Conformal Prediction | Feb 20, 2024 | Conformal PredictionDeep Learning | CodeCode Available | 3 |
| CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up | Dec 20, 2024 | 8kGPU | CodeCode Available | 3 |
| AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring | Jun 13, 2024 | DeblurringDecoder | CodeCode Available | 3 |
| LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | Oct 1, 2024 | GPULanguage Modeling | CodeCode Available | 3 |
| How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks? | Jan 20, 2025 | Computed Tomography (CT)GPU | CodeCode Available | 3 |
| LinFusion: 1 GPU, 1 Minute, 16K Image | Sep 3, 2024 | 16kCausal Inference | CodeCode Available | 3 |