| ParallelComp: Parallel Long-Context Compressor for Length Extrapolation | Feb 20, 2025 | 4k8k | —Unverified | 0 |
| Distributed U-net model and Image Segmentation for Lung Cancer Detection | Feb 20, 2025 | CPUFederated Learning | —Unverified | 0 |
| Towards Efficient Automatic Self-Pruning of Large Language Models | Feb 20, 2025 | GPU | —Unverified | 0 |
| Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling | Feb 20, 2025 | DecoderGPU | CodeCode Available | 0 |
| Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective | Feb 20, 2025 | CPUGPU | —Unverified | 0 |
| FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference | Feb 19, 2025 | GPU | —Unverified | 0 |
| GPU-Friendly Laplacian Texture Blending | Feb 19, 2025 | GPU | —Unverified | 0 |
| RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression | Feb 19, 2025 | GPU | —Unverified | 0 |
| Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference | Feb 19, 2025 | GPURetrieval | —Unverified | 0 |
| MEX: Memory-efficient Approach to Referring Multi-Object Tracking | Feb 19, 2025 | Autonomous DrivingGPU | —Unverified | 0 |