| Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback | May 30, 2024 | GPUKnowledge Graphs | —Unverified | 0 |
| KPNet: Towards Minimal Face Detector | Mar 17, 2020 | Face DetectionGPU | —Unverified | 0 |
| Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference | Aug 14, 2024 | GPULanguage Modeling | —Unverified | 0 |
| KunServe: Efficient Parameter-centric Memory Management for LLM Serving | Dec 24, 2024 | GPULanguage Modeling | —Unverified | 0 |
| KurTail : Kurtosis-based LLM Quantization | Mar 3, 2025 | GPULanguage Modeling | —Unverified | 0 |
| KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization | May 7, 2024 | GPULanguage Modeling | —Unverified | 0 |
| KVDirect: Distributed Disaggregated LLM Inference | Dec 13, 2024 | GPUScheduling | —Unverified | 0 |
| KV-Distill: Nearly Lossless Learnable Context Compression for LLMs | Mar 13, 2025 | GPUQuestion Answering | —Unverified | 0 |
| L2PF -- Learning to Prune Faster | Jan 7, 2021 | Autonomous DrivingGPU | —Unverified | 0 |
| L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference | Apr 24, 2025 | GPU | —Unverified | 0 |