| Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging | Jun 29, 2025 | Inference OptimizationMixture-of-Experts | CodeCode Available | 0 |
| The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries | Jun 14, 2025 | Bug fixingInference Optimization | —Unverified | 0 |
| Brevity is the soul of sustainability: Characterizing LLM response lengths | Jun 10, 2025 | DecoderInference Optimization | CodeCode Available | 0 |
| DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation | May 20, 2025 | In-Context LearningInference Optimization | —Unverified | 0 |
| Faster MoE LLM Inference for Extremely Large Models | May 6, 2025 | Inference OptimizationMixture-of-Experts | —Unverified | 0 |
| SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL | Apr 15, 2025 | Inference Optimization | CodeCode Available | 3 |
| Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Apr 15, 2025 | GPUInference Optimization | CodeCode Available | 4 |
| The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation | Apr 7, 2025 | Inference OptimizationReferring Video Object Segmentation | CodeCode Available | 5 |
| Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification | Feb 23, 2025 | ClassificationInference Optimization | —Unverified | 0 |
| Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization | Feb 14, 2025 | GSM8KInference Optimization | —Unverified | 0 |