| Deep Modulation Embedding | Feb 17, 2019 | GPU | —Unverified | 0 |
| Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference | Aug 14, 2024 | GPULanguage Modeling | —Unverified | 0 |
| KunServe: Efficient Parameter-centric Memory Management for LLM Serving | Dec 24, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Hybrid-Regressive Neural Machine Translation | Oct 19, 2022 | CPUDecoder | —Unverified | 0 |
| KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization | May 7, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Hardware-Aware Graph Neural Network Automated Design for Edge Computing Platforms | Mar 20, 2023 | Edge-computingGPU | —Unverified | 0 |
| Hardware and Software Platform Inference | Nov 7, 2024 | GPULarge Language Model | —Unverified | 0 |
| DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network | Mar 5, 2023 | GPUImage Classification | —Unverified | 0 |
| Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead | Dec 21, 2020 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer | Sep 18, 2020 | GPUPosition | —Unverified | 0 |