| FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge | Oct 28, 2024 | GPU | CodeCode Available | 0 |
| KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation | Oct 28, 2024 | GPUKnowledge Distillation | CodeCode Available | 1 |
| ThunderKittens: Simple, Fast, and Adorable AI Kernels | Oct 27, 2024 | GPUState Space Models | CodeCode Available | 7 |
| Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading | Oct 26, 2024 | CPUGPU | CodeCode Available | 0 |
| Computational Bottlenecks of Training Small-scale Large Language Models | Oct 25, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies | Oct 24, 2024 | GPUparameter-efficient fine-tuning | —Unverified | 0 |
| KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing | Oct 24, 2024 | GPU | CodeCode Available | 1 |
| LOGO -- Long cOntext aliGnment via efficient preference Optimization | Oct 24, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search | Oct 24, 2024 | ClusteringGPU | CodeCode Available | 2 |
| Sort-free Gaussian Splatting via Weighted Sum Rendering | Oct 24, 2024 | 3DGS3D Scene Reconstruction | —Unverified | 0 |