| Crypto Miner Attack: GPU Remote Code Execution Attacks | Feb 9, 2025 | CPUGPU | —Unverified | 0 |
| Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline | Feb 9, 2025 | CPUGPU | CodeCode Available | 0 |
| Saving 77% of the Parameters in Large Language Models Technical Report | Feb 9, 2025 | GPUText Generation | CodeCode Available | 2 |
| fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving | Feb 7, 2025 | CPUGPU | —Unverified | 0 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 |
| QuEST: Stable Training of LLMs with 1-Bit Weights and Activations | Feb 7, 2025 | GPUQuantization | CodeCode Available | 2 |
| InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers | Feb 6, 2025 | GPULarge Language Model | —Unverified | 0 |
| WaferLLM: Large Language Model Inference at Wafer Scale | Feb 6, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache | Feb 5, 2025 | GPU | —Unverified | 0 |
| SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond | Feb 5, 2025 | feature selectionGPU | CodeCode Available | 1 |